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Preface 


This book is a theoretical introduction to the optics of charged par- 
ticle beams. The purpose is to identify the most important ideas 
and derive them mathematically from first principles of physics. 
It is a teaching document, intended for an audience of students in 
the broad sense. As a science book, it focuses on basic principles in 
a connected way. It is intended for the intelligent non-expert who 
is comfortable with calculus at an advanced undergraduate level. 
Experts, including experimentalists, instrument designers, and in- 
strument users, will also find it to be a convenient reference for 
understanding the theoretical origins of the subject. 


Enormous experimental progress has been made in recent years, 
culminating in commercial availability of aberration-corrected 
transmission electron microscopes with resolution below 0.1 nm, 
energy analyzers with resolution in the meV range, and gas field 
ion microscopes with resolution below 1.0 nm, to name a few ex- 
amples. These innovations are built upon the ongoing efforts of 
pioneers over the past decades. These advances enable an ever- 
growing array of applications at the atomic scale of dimensions. 
Unfortunately, the underlying theory can appear arcane and baf- 
fling to someone who is new to the field. One cannot possibly un- 
derstand aberration correction without first having a firm grasp 
on optics in the paraxial approximation, and the origin of the pri- 
mary aberrations, for example. 


This book is intended to convey an intuitive understanding of the 


basics, as opposed to presenting a comprehensive compendium of 
the detailed subject. It is meant to be logical, with each step fol- 
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lowing directly from the preceding step, insofar as this is possible. 
For this reason, it is highly recommended that the reader adhere 
to the logical sequence, and make the effort to follow the mathe- 
matical steps along the way. Problems are included to amplify and 
fill in the theoretical details, and to provide practical examples. 


Many excellent books have been written over the years on this 
general topic. Indeed we have attempted to include these in the 
references. As the subject has matured, the various topics have 
been treated in increasing detail and precision in the literature. In 
order to present an up-to-date review of the subject, it is common 
practice for authors to present the main results only, referring the 
reader to a list of earlier references for detailed derivations and 
justifications. The methodology here is quite different. All of the 
ideas presented are derived from first principles of physics. In some 
instances this excludes the most recent detailed and precise results 
of others. The idea is to convey an intuitive scientific feel for the 
subject. 


It is standard practice in physics research that, if a particular prob- 
lem cannot be solved, a related problem is identified which can be 
solved. This inevitably involves approximation. This approach is 
used here in several instances, most notably in the descriptions of 
particle scattering and electron emission from solids. 


We begin with a general introduction in Chapter 1, consisting 
of a non-mathematical survey of the optical nature of a charged 
particle beam. A number of practical systems are described that 
highlight the enormous breadth and depth of present-day applica- 
tions. 


Next, Chapter 2 describes geometrical optics. This begins with a 
review of relativistic classical mechanics for the motion of a single 
particle with general charge q and rest mass m. Based on this, the 
principles underlying geometrical optics are then derived, includ- 
ing a prescription for solving for the ray path, which is the physical 
path taken by a single particle. This chapter is completely accurate 
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with respect to the special theory of relativity. Interestingly, this 
adds no significant complexity over the historical non-relativistic 
treatments, but does lead to a more accurate mathematical de- 
scription. We therefore keep everything relativistically correct to 
the extent possible. 


Chapter 3 describes wave optics. We begin with a review of quan- 
tum mechanics, limited to only those ideas that impact the motion 
of a single charged particle. We begin with the non-relativistic ap- 
proximation and Schrödinger’s equation. Relativity is introduced 
later in the form of the Klein-Gordon equation. This skirts a 
rigorous treatment of spin, but keeps things from becoming too 
abstract, while producing a practical result. The discussion culmi- 
nates with the quantum mechanical solution for the propagation of 
the single-particle wave function in a general electromagnetic po- 
tential. The correspondence between wave optics and geometrical 
optics in the classical limit emerges naturally from this discussion. 


We then discuss diffraction and interference, starting with Huy- 
gens’ principle, and proceeding through the scalar Helmholtz equa- 
tion, the Huygens—Fresnel relation, the Fresnel approximation, and 
the Fraunhofer approximation. Next we discuss a number of useful 
examples, including formation of an image and a diffraction pat- 
tern, the general optical transformation from object to image, and 
the fundamental relationship between diffraction and Heisenberg’s 
uncertainty principle. 


Chapter 4 describes the two-body scattering problem, which is ba- 
sic to the interaction of a fast charged particle with matter. Most 
of the relevant information about the scattering process is con- 
tained in the scattering cross section, which is derived first in the 
classical approximation, and then in the quantum mechanically. 
Chapter 5 describes electron emission as a practical consequence 
of quantum mechanics. Finally, the appendices contain two essen- 
tial mathematical topics, which are repeatedly referred to in the 
main text. 
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All useful information about the motion of a single charged parti- 
cle is contained in the integral of the classical Lagrangian function 
between two arbitrary points in time. This integral is known his- 
torically as Hamilton’s principal function, and alternatively as the 
eikonal function. The actual path taken by the particle, chosen 
among a multiplicity of mathematically possible paths, is the path 
for which this integral has an extremum. In the important special 
case where the general electromagnetic potential has no explicit 
time dependence, the action integral reduces to a line integral of 
the canonical momentum component along the ray path. This is 
a considerable simplification in problems where one is only inter- 
ested in the spatial coordinates of a ray, without the need to know 
the arrival time at any given point. The extremum condition is 
generally known as the principle of least action, which is express- 
ible in concise and precise mathematical terms. 


In quantum mechanics all relevant information about the motion 
of a single particle is contained in the wave function, for which 
the same action integral in units of Planck’s constant ñA is the 
phase. It follows that all possible paths in the immediate vicinity 
of the classical path interfere constructively. The classical path is 
thus the path that maximizes the probability. This clarifies the 
particle-wave duality in concise and elegant mathematical terms. 
A close analogy exists between Fermat’s principle of light optics 
and the principle of least action for a charged particle. The anal- 
ogy between light optics and charged particle optics is deep, and 
is manifested in quite practical ways, including diffraction and in- 
terference. These ideas are derived mathematically from first prin- 
ciples. 


The literature of this mature field is extensive. Several books are 
of particular interest. The three-volume set by Hawkes and Kasper 
[43, 44, 45] describes the main principles in precise and compre- 
hensive detail, with reference to the work of many authors over the 
decades. There is arguably no better review of the enormous body 
of work that brought the field to its present state. Geometrical 
Charged-Particle Optics by Rose [75] is both general and compre- 
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hensive in its mathematical description of systems with general 
curvilinear axes. This includes the straight optic axis and axial 
symmetry as special cases, and also includes the theory of cor- 
rection of geometrical aberrations. Correction of spherical aberra- 
tion in electron microscopes is described in a detailed, up-to-date 
way. This book also derives the main ideas from first principles 
of physics. The book Handbook of Charged Particle Optics, edited 
by Orloff [67], describes a variety of experimental and theoretical 
topics in a way that is accessible to readers with a range of expe- 
rience. It also describes correction of spherical aberration. 


The present book complements these in several important ways. 
It is an introductory textbook that prepares the student to tackle 
the detailed and comprehensive literature. It proceeds from first 
principles of physics in a structured way, including geometrical op- 
tics (classical mechanics), wave optics (quantum mechnics), and 
the correspondence between them. Finally, it includes several top- 
ics not normally included in other books on charged particle op- 
tics, but that are essential to practical systems. These include a 
first-principles theory of Coulomb interaction in charged particle 
beams, particle scattering by materials, and electron emission from 
materials. 


Chapter 1 


Introduction: The optical 
nature of a charged 
particle beam 


Modern physics teaches that all matter is made of particles which 
interact with one another. Every particle is characterized by its 
intrinsic charge, mass, and spin. These quantities govern all inter- 
actions which a particle can have. For example, an atom consists 
of a cloud of negatively charged electrons orbiting a compact, pos- 
itively charged nucleus. The establishment of this fact in quanti- 
tative terms has a fascinating history. It originates with the early 
hypothesis of Democritas, proceeds through the origins of quanti- 
tative chemistry in the seventeenth century, and culminates with 
the elucidation of quantum mechanics in the twentieth century. 
Only during the last few decades has it become possible to cap- 
ture an actual image of a single atom. 


Atoms are charge-neutral in their normal state, with the positive 
charge of the nucleus precisely offset by negative charge of the or- 
biting electrons. By bombarding an atom with a beam of light or 
charged particles, it is possible to remove one or more electrons 
from an atom or molecule. This forms a positively charged ion. 
Under special circumstances it is also possible to add electrons to 
form a negatively charged ion. Electric and magnetic fields act on 
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the intrinsic charge of electrons and ions through the force known 
as the Lorentz force, after the physicist who first identified it in 
the nineteenth century. By bombarding with a very high energy 
beam, the atomic nucleus can dissociate into its constituent ele- 
mentary particles. This is the mechanism by which a high energy 
particle accelerator is used to probe the fundamental makeup of 
matter. 


Many examples of free charged particles exist in nature. Ener- 
getic ions appear as cosmic rays which pervade interstellar space, 
and bombard the earth’s atmosphere in large numbers. A large va- 
riety of subnuclear particles are produced in high energy particle 
accelerators. Many of these also appear as cosmic rays. The beam 
inside an electron microscope or a cathode ray tube consists of 
free, energetic electrons in a vacuum. Indeed, it is not difficult to 
form a beam of charged particles in a vacuum by making use of the 
intrinsic properties of matter, together with electric and magnetic 
fields to focus and steer the beam. 


According to the laws of classical physics, a single charged par- 
ticle traces out a path of motion under the influence of electric 
and magnetic fields. A collection of many particles emitted from 
a source, each with its own trajectory, form a beam. 


Two common sources are shown schematically in Figure 1.1. In 
(a) a hot tungsten wire at the top of the figure, with a tem- 
perature of about 2000 degrees Kelvin is placed opposite a pla- 
nar electrode called the anode. The anode is typically electrically 
grounded. Electrons are spontaneously emitted from the hot wire 
by the process of thermionic emission. By means of an external 
power supply, the tungsten wire is elevated to a negative voltage 
which can be anywhere between a few volts to a few millions of 
volts relative to the anode. This voltage is called the accelerating 
voltage, because the resulting electric field accelerates the parti- 
cles. This forms a beam, which is analogous in several fundamental 
ways to a beam of light. Each trajectory in the figure corresponds 
to the path of a single charged particle. 
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(a) (b) 


Figure 1.1: (a) electron source, and (b) positive ion source. 


In (b) a tungsten wire is formed into a very sharp tip. The tip 
is elevated to a positive voltage, typically a few thousand to a few 
tens of thousands of volts, relative to the planar electrode. A small 
amount of helium gas is admitted into the system. Helium atoms 
diffuse toward the vicinity of the tip, where they are ionized in 
the very high electric field. This is known as a gas field ionization 
source. Ion sources of other chemical species exist as well. Prac- 
tically any material which can be ionized can be used to form an 
ion beam. This enables a rich variety of species of ion beams to be 
formed. 


In all cases, an electric field accelerates the charged particles. Each 
particle acquires an energy equal to its charge times the acceler- 
ating voltage. A natural unit of energy is the electron- Volt, abbre- 
viated as eV. It is the energy which a particle with one electronic 
charge acquires when accelerated through one volt. The beam en- 
ergy is thus easily tuned to almost any desired value by simply 
controlling the accelerating voltage. This turns out to have con- 
siderable practical utility. Practical charged particle beams range 
in energy from a few eV to about fourteen trillion eV. This is the 
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design energy of the Large Hadron Collider (LHC) at CERN, the 
world’s most energetic particle accelerator, located on the France- 
Switzerland border. Incidentally, the beam must be in a vacuum 
chamber in all useful particle beam instruments, since the parti- 
cles would immediately be absorbed in air at normal atmospheric 
pressure, regardless of their energy. 


A charged particle beam is conceptually similar in many respects 
to a beam of light. It is therefore interesting to think about charged 
particle optics in an analogous way to light optics. This forms a 
central theme in the present study. For example, electric and mag- 
netic fields can be configured to form a lens, which focuses the 
charged particle beam. An example of a magnetic lens is shown 
schematically in Figure 1.2. A current-carrying solenoid is depicted 
in the figure by the two rectangles, which represent the cross sec- 


Figure 1.2: Magnetic focusing of a beam of electrons. 
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tion. The solenoid is surrounded by a shroud of soft iron, which 
concentrates the magnetic field. The magnetic field lines bulge into 
the region of the electron beam, which is incident from the top of 
the figure. The beam is focused to a small probe at the target 
plane, shown at the bottom of the figure. Such an arrangement is 
used in a scanning electron microscope. The magnetic field lines 
and the electron trajectories are generated in a computer simula- 
tion by MEBS, Ltd. [63]. The beam path is 100 mm long in the 
figure, the beam energy is 10 KeV, and the solenoid carries 550 
ampere-turns. In reality, the electrons spiral around the central 
optic axis. The figure is plotted in a coordinate system which ro- 
tates about the axis with the beam, so that the trajectories appear 
not to rotate. This is for clarity. 


An example of an electrostatic lens is shown schematically in Fig- 
ure 1.3. Electrons are emitted from a heated flat surface at zero 
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Figure 1.3: Electrostatic acceleration and focusing of a beam of 
electrons. 


volts relative potential on the left of the figure, and are acceler- 
ated to the right. An aperture at —10 volts forms a grid to control 
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the total beam current. A second aperture at +600 volts forms 
an extraction field for emission. Finally, a high voltage electrode 
at +18,000 volts is located far to the right, out of the figure. The 
apertures both have diameter 0.6 mm, and the other dimensions 
in the figure scale proportionally. The curved equipotentials pen- 
etrate the space occupied by the beam, and are separated by 100 
volts in the figure. These equipotentials can be regarded as form- 
ing a lens, which focuses the beam to a crossover at the right of 
the figure. Such an arrangement is used in a cathode ray tube. 
The electrostatic equipotential surfaces and the electron trajecto- 
ries are generated in a computer simulation by MEBS, Ltd. [63]. 


In addition to focusing a beam to a pointlike spot, a lens can 
also be used to form a magnified image of an extended object. 
This is shown schematically in Figure 1.4. Every object point in 


j< f oh 
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Figure 1.4: Imaging an off-axis point by a lens. 


the plane located at zo emits a cone of rays into the lens at plane 
zy. A particular object point is located a vertical distance ro from 
the central axis in the figure. A ray which is emitted in a direction 
parallel with the central axis is deflected by the lens, and intersects 
the central axis at the focal point located an axial distance f from 
the lens. A second ray passes through the center of the lens, and 
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is undeflected. These two rays are depicted as bold lines in the 
figure. They intersect in a distant plane located at zy, at a point 
which is located at a distance r; from the central axis. In fact, all 
rays emitted from this object point, regardless of their angle of 
emission, are focused by the lens to the same image point. Since 
all rays converge to a single point, it is apparent that a one-to-one 
mapping of the object point into an image point exists. In order 
for this to happen, each ray must experience a change in slope 
which is proportional to the distance from the central axis. This 
is the remarkable focusing action of an ideal lens. 


Since this works for any point in the object plane zo, we deduce 
that all object points are imaged simultaneously, each to a unique 
point in the image plane. This is the mechanism by which a mag- 
nified image of an extended object is formed. The negative of the 
ratio of rr to ro is called the magnification of the image relative 
to the object. By convention, the magnification is negative in this 
case, because the image is inverted relative to the object. By per- 
forming the construction in Figure 1.4 for multiple object points 
ro, it is easy to convince oneself that this magnification is the 
same for all object points. The magnification depends only on the 
relative positions of the object plane zo and the lens plane zz, and 
on the focal length f. The smaller the focal length f, the more the 
rays are deflected, and the stronger is the lens. The focal length 
is the same for all object points ro. For a charged particle beam, 
the focal length also depends on the particle energy. The higher 
the particle energy, the longer is the focal length. This is a direct 
result of the fact that a faster particle spends less time in the lens 
field, and is therefore deflected less than a slower particle. 


The construction in Figure 1.4 works for both charged particles 
and light. Many striking similarities exist between light optics and 
charged particle optics. In both cases, no optical system is capable 
of forming a perfect image. Blur and distortion are always present 
to some degree. These imperfections are called aberrations. An 
important example is the so-called spherical aberration, in which 
the outermost rays are focused more strongly than the innermost 
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rays. As a result, the beam is not focused to a point, but rather 
is blurred. This is readily apparent in Figure 1.3. Spherical aber- 
ration occurs in light optical lenses as well as charged particle 
lenses. In light optics, it arises from the fact that ordinary lenses 
have spherical surfaces, hence the name spherical aberration. It is 
substantially corrected in light optics by grinding the lens surfaces 
to a particular aspherical shape. It is not possible to shape the elec- 
tric and magnetic fields of a charged particle lens in an analogous 
way, because the fields always obey Maxwell’s equations. Signifi- 
cant progress has been made over the last two decades in correcting 
the aberrations of charged particle lenses. The details are beyond 
the scope of this study. The reader is referred to two excellent 
references by Rose [75] and by Krivanek, et. al. [54] for precise 
details. Indeed, it is hoped that the present study will provide the 
background needed to approach this advanced topic expeditiously. 


It is apparent from Figure 1.3 that the innermost rays close to 
the central axis are less aberrated than the outermost rays. Se- 
lecting the inner rays and blocking the outer rays would improve 
the quality of the focusing. This suggests a simple way of mitigat- 
ing the effect of the spherical aberration for a given optical system, 
namely, by using an aperture to admit the inner rays, while block- 
ing the outer rays. Conceptually, one could add an aperture in the 
lens plane of Figure 1.4, thus limiting the cone of rays. A conve- 
nient measure of the constriction is given by the index of refraction 
times the sine of the angle which the extreme ray makes with the 
central axis at an object point on the axis. This product is known 
as the numerical aperture. The larger the aberration, the smaller 
the numerical aperture must be to obtain the desired image qual- 
ity. In fact, the size of the numerical aperture can be used as a 
useful estimate of the quality of the optical system. In practice, 
the numerical aperture is typically in the range of 0.3 to 1.3 for 
light optical lenses, and 0.001 to 0.1 for charged particle lenses, 
in order to achieve optimal imaging conditions. This expresses the 
fact that charged particle lenses have significantly worse aberra- 
tion than light optical lenses. 
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Classical mechanics regards a single particle as a hypothetical 
point, with the position and velocity known in principle at any 
given instant in time. In reality, a single particle also behaves like 
a wave. The wavelength is is equal to Planck’s constant h divided 
by the particle momentum, where h = 6.6261 x 10734 Joule- sec. A 
faster particle thus has a shorter wavelength than a slower parti- 
cle. This so-called wave-particle duality is a hallmark of quantum 
mechanics, which is a more accurate description of nature than 
classical mechanics on the atomic and subatomic scale of dimen- 
sions. Classical mechanics is sufficiently accurate for many pur- 
poses, however, so it is worth retaining. Quantum mechanics has a 
very specific correspondence with classical mechanics for a charged 
particle in the limit of high energy. This will prove to be a central 
theme in the present study. 


Quantum mechanics teaches that the absolute square of the wave 
amplitude is equal to the probability that a single measurement 
finds the particle at a given position at any given instant in time. 
Because this probability is described by a propagating wave, it is 
not possible to know the position and momentum simultaneously 
with perfect precision. This is known as the Heisenberg uncer- 
tainty principle, after the physicist who first elucidated it in the 
1920s. A remarkable consequence of quantum mechanics, and one 
which may appear counterintuitive at first, is that a single particle 
can be described by two or more waves which interfere construc- 
tively or destructively with one another. Each wave corresponds to 
a particular alternative path of motion of the particle, where the 
actual path of motion is fundamentally unknowable. For example, 
it is impossible to know which path in Figure 1.4 is the actual path 
taken by the charged particle. Each possible path can be described 
by a separate wave, where all of the waves corresponding to the 
different paths propagate coherently, with a particular phase rela- 
tionship to one another. They all interfere at the image plane to 
cause a blurred spot (not depicted in the figure). 


This interference is intimately related to diffraction, which re- 
sults from the propagation, spreading, and interference of waves. 
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Diffraction is familiar in light optics. For example, it imposes a 
fundamental limit on the resolution of a microscope. Because of 
diffraction, it is not possible in a conventional microscope to re- 
solve any object which is appreciably smaller than the wavelength. 
This turns out to be true for both a light microscope and an elec- 
tron microscope. It is another example of the close analogy that 
exists between charged particle optics and light optics. Since the 
wavelength of a fast charged particle is much smaller than that of 
visible light, it is expected that the resolving power of a charged 
particle microscope should be much better than a light microscope. 
This is indeed verified in practice. A modern electron microscope 
can resolve a single atom, a feat which is in no way possible with 
visible light. 


Charged particles interact strongly with matter. This forms the 
basis of many useful instruments. For example, a fast electron can 
be scattered by an atomic nucleus of the target material, with 
little energy transferred to the material. This is known as elastic 
scattering, and forms the basis of contrast in a transmission elec- 
tron microscope. Alternatively, the incident particle can transfer 
energy to the sample material, giving rise to secondary processes. 
For example, a secondary electron or ion can be ejected. By mea- 
suring the charge and mass of the ejected particle, useful chemical 
and physical information about the sample is obtained. 


Three generic types of electron microscopes exist. These are shown 
schematically in Figure 1.5. A conventional transmission electron 
microscope (TEM) is shown in (a). A transparent specimen S' is 
illuminated from above, where the illumination is omitted for sim- 
plicity. Some electrons are elastically scattered at the object point, 
and some remain unscattered. The unscattered current passes 
through an aperture A, and is imaged by a lens L onto the record- 
ing plane P, where P typically consists of an array of charged 
coupled devices. Some fraction of the scattered current is stopped 
by the aperture A. Areas of the specimen which scatter strongly 
thus appear dark in the image, and areas which scatter weakly 
appear bright. The object point is depicted as being off the cen- 
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Figure 1.5: Types of electron microscopes, schematic. 


tral axis. Actually, all object points in the specimen S are imaged 
simultaneously. 


A scanning transmission electron microscope (STEM) is shown 
in (b). An aperture A is illuminated from above. The transmitted 
current is focused by a lens L onto a transparent specimen S, and is 
scanned sequentially over the specimen in a raster pattern. Again, 
some electrons are elastically scattered by the specimen, and some 
remain unscattered. Some fraction of the scattered current is mea- 
sured on the annular dark field detector D, and the resulting signal 
sent to a display which is scanned synchronously with the beam of 
the microscope. Areas of the specimen which scatter strongly thus 
appear bright on the display, and areas which scatter weakly ap- 
pear dark. This is not an image in the sense of Figure 1.4, because 
the intensity of each pixel is determined sequentially. However, 
it does produce an intensity map of the specimen which is just 
as useful as an optically formed image. The current which passes 
through the annular dark field detector D is measured on a bright 
field detector B. This signal can alternatively be displayed, with 
strongly scattering regions appearing dark, and weakly scatter- 
ing regions appearing bright. The ultimate resolution of the TEM 
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and STEM is the same, but the contrast differs in the two cases, 
depending on the accelerating voltage and numerical aperture cho- 
sen. In both cases, the numerical aperture is equal to the beam 
semi-angle subtended by the aperture A at the specimen S. 


A scanning electron microscope (SEM) is shown in (c). An aper- 
ture A is illuminated from above. The transmitted current is fo- 
cused by a lens L onto an opaque bulk specimen 5, and is scanned 
sequentially over the specimen in a raster pattern. Low energy sec- 
ondary electrons are excited by the beam in the interaction volume 
depicted by the darker area of the specimen S. These secondary 
electrons are accelerated to a collector C, and the current thus 
detected is used to form a signal which is sent to the display. 


The ultimate resolution of the SEM is roughly equal to the size 
of the interaction volume, which is typically on the order of a few 
nanometers. One nanometer is one-billionth of a meter, and will 
be abbreviated 1 nm throughout the text. This is a very good res- 
olution, compared with a typical light microscope, for which the 
resolution is typically a few hundred nm. In addition, an SEM 
has superior depth of focus. This means that one need not focus 
precisely, in order to obtain a sharp image, allowing a seemingly 
three-dimensional depiction of a bulk sample. This is shown in Fig- 
ure 1.6, courtesy of L.T. Varghese and L. Fan, Purdue University 
[90]. 


The schematic depiction in Figure 1.5(c) applies equally well to 
a scanning ion microscope (SIM), where the beam consists of ions 
rather than electrons. A bright source of helium ions can be formed 
using a sharp tip in a low pressure helium gas. The tip is elevated 
to a potential of a few tens of kilovolts relative to the surrounding 
chamber, causing a high electric field around the tip. Helium gas 
atoms are polarized in the field gradient, and attracted to the tip, 
where they dissociate to form positive helium ions. The ions are 
accelerated away from the tip by the electric field to form the ion 
beam. 
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Figure 1.6: SEM picture of self-assembled silica spheres, showing 
high depth of focus. 


A scanning helium ion micrograph is shown in Figure 1.7, obtained 
using an Orion SIM available commercially from Carl Zeiss SMT. 
A scannng electron micrograph of the same specimen is shown 
for comparison, obtained using a Leo SEM also available com- 
mercially from Carl Zeiss SMT. The full-scale vertical dimension 
is 6 um, and the beam energy is 20 KeV in both cases. The he- 
lium ion micrograph shows striking surface detail. This is due to 
the fact that the helium ions are stopped within a few tens of 
nanometers of the surface, while the electrons penetrate several 
microns into the material. As a result, the material appears more 
transparent to electrons than to helium ions. The electron micro- 
graph shows more contrast due to the different materials present. 
This is due to the fact that materials with high atomic number and 
high mass density preferentially scatter the electrons much more 
strongly than low atomic number and low mass density materials. 
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Figure 1.7: (Top) Scanning helium ion micrograph, (bottom) scan- 
ning electron micrograph. 


This gives rise to high material contrast in the scanning electron 
microscope. 
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The ultimate resolution of the TEM and STEM is limited by 
spherical aberration and diffraction. The spherical aberration can 
be substantially corrected in a modern TEM and STEM, making 
both instruments capable of resolution in the range of 0.05 nm. 
This is more than sufficient to form an image of a single atom. 
An example of a corrected STEM image is shown in Figure 1.8. 
The specimen is graphene, which consists of one or more atomic 


Figure 1.8: Aberration-corrected STEM images of graphene. 


layers of graphite. A single layer of graphene is one atomic layer 
of carbon in its hexagonal crystalline form. The image on the left 
is a single scan recorded at 60 KV accelerating voltage in a Nion 
aberration-corrected STEM. The bright spots are single carbon 
atoms with nearest-neighbor spacing of 0.14 nm. The image on 
the right is derived by digitally superimposing 350 different areas 
of the larger image, with each area consisting of 128 x 128 pix- 
els. This averages out the noise in the individual scans, without 
having to resort to smoothing algorithms. (The individual pixels 
are visible in the two images). This annular dark field image is 
remarkable in several respects. First, single atoms of carbon are 
clearly resolved with resolution better than 0.1 nm. Second, the 
atomic number of carbon is six, which is low relative to most solid 
materials. The specimen is therefore weakly scattering everywhere, 
thus limiting the available contrast. The fact that the contrast is 
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adequate is remarkable. Third, the fragile graphene structure is 
undamaged by the beam. It would not be possible to obtain such 
an image without spherical aberration correction. 
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Figure 1.9: SIMS images of chromosomes. 


Alternatively, a beam of ions can be used to perform chemical 
analysis of a material. The ion beam is focused and scanned over 
the surface of the material to be analyzed. Atoms are removed from 
the surface and ionized. These secondary ions then pass through a 
spectrometer which separates the various ionic species according to 
their masses. An image is formed synchronously, consisting of any 
chosen individual chemical species. This is called Secondary Ion 
Mass Spectrometry or SIMS. An example is shown in Figure 1.9 
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[58]. The sample consists of chromosomes of the fruit fly Drosophila 
melanogaster. Four different chemical species are shown, giving 
detailed spatially resolved chemical information about the chro- 
mosomes. These images were obtained using the SIMS tool at the 
University of Chicago, which uses a focused primary ion beam con- 
sisting of gallium ions from a liquid metal ion source. This method 
can be used to analyze an enormous variety of samples at the mi- 
croscopic level. 


Alternatively, an electron or ion beam can physically or chemically 
alter the target material locally. The writing substrate is coated 
with a thin film of organic material. Bombarding the film with a 
focused electron or ion beam renders the film either more soluble 
(positive-tone process) or less soluble (negative-tone process) in 
the developer. The organic film is thus patterned, and forms a bi- 
nary mask for subsequent process steps. Creating fine patterns on 
a substrate is commonly referred to as lithography. An enormous 
variety of useful devices can be fabricated with high areal density 
and very small feature sizes. 


Two patterns written by electron beam lithography are shown in 
Figure 1.10. The top pattern shows the negative-tone resist which 
is left behind after the development step. It is an electronic circuit 
pattern with 30 nm features, courtesy of Vistec Lithography. The 
bottom pattern shows pillars of silicon which are 0.5 wm in diame- 
ter and 1.5 um high. They were written using a Vistec SB352 HR 
electron beam system, courtesy of IMS CHIPS, Stuttgart, Ger- 
many. 


A focused electron beam is the smallest, finest practical writing 
pencil known. An arbitrary pattern can be created and stored 
using standardized computer-aided design software, and subse- 
quently transmitted to the electron beam writer for one or more 
exposures. This flexibility, together with the high resolution, make 
electron beam lithography the method of choice for creating pat- 
terns on the nanometer scale of dimensions in low volume. 


18 Introduction: The optical nature of a charged particle beam 


Figure 1.10: Electron beam lithography patterns. 


In summary, the inherent high resolution, together with the unique 
interactivity with matter thus constitute two fundamental ad- 
vantages of charged particle beams. They make charged particle 
beams indispensible to science and technology on the nanometer 
scale of dimensions. With this introduction, we are now in a posi- 
tion to begin our analytical study in detail. 


Chapter 2 


Geometrical optics 


Geometrical optics of charged particle beams begins with rela- 
tivistic classical mechanics, specifically, the motion of a charged 
particle in the presence of external electric and magnetic fields. 
The fields exert an instantaneous resultant force on the particle, 
which determines the path of motion. Mathematically, the solution 
consists of finding the three-vectors for position x and velocity v 
at any time t, given initial values at time zero, taking account of 
the influence of the fields. 


Having found a prescription for solving a general particle trajec- 
tory, we can then apply this to families of trajectories. This permits 
us to delineate the geometrical optical properties of a beam of par- 
ticles. We begin with a review of relativistic classical mechanics, 
focusing only on those specific topics which will lead directly to 
geometrical optics. 


19 


20 Chapter 2. Geometrical optics 


2.1 Relativistic classical mechanics 


In classical mechanics, a system is described by one or more gen- 
eralized coordinates Q;, where 


{Q;} = @1,Q2,..-,Qn, (2:1) 


and n is the number of degrees of freedom needed to completely 
specify the system. For example, a three-dimensional Cartesian 
coordinate system can be used to completely specify position in 
ordinary space, and has three degrees of freedom. 


The Q; evolve under the influence of forces, and therefore depend 
implicitly on the time t. There exist velocities Q; given by 


{Q;} = Qi, Q2... Qn, (2:2) 
where the dot denotes differentiation with respect to time, i.e., 
. d 
Q; = T Qj. (2.3) 


This is quite general, since n can take on any positive integral 
value. For example, a system of N interacting particles hasn = 3N 
degrees of freedom. 


The central problem of classical mechanics can be stated as fol- 
lows: given a set of coordinates Q; and velocities Q; at an initial 
time to, calculate the Q; and Q; at any time t. The result of this 
calculation represents a complete specification of the system. In 
the present study, we will confine our attention to a single par- 
ticle of rest mass m and charge q under the influence of electric 
and magnetic forces. We therefore define generalized coordinates 
£j = (£1, %2,%3) with corresponding velocities v; = (v1, v2, v3), 
where the six-vector components are functions of time t. In this 
case the central problem is to calculate these quantities. The pre- 
scription is general with respect to the choice of coordinate sys- 
tems. For example, one could use Cartesian, cylindrical, spherical, 
or other coordinates with three degrees of freedom to specify the 
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position. The reader is referred to the book by Goldstein [35] for 
a thorough and detailed discussion of classical mechanics. 


2.1.1 Hamilton’s principle of least action 


We seek a general condition governing the motion of a particle 
with charge q and rest mass m in external electric and magnetic 
fields. We require that this condition be covariant with respect 
to the Lorentz transformation of special relativity. This ensures 
that the equations of motion have the same form in all frames of 
reference in uniform motion with respect to one another. To this 
end, following Goldstein, et. al. [35], we define a function £, called 
the invariant Lagrangian, as 


4 
L= Y (mU,U, +q Ap, Up). (2.4) 


Here U,, and A,, are the four-vector velocity and electromagnetic 
potential, respectively, given by 


U, = (yv, iyc) 
Ay, (A, id/c), (2.5) 


where v is the three-vector particle velocity, A is the magnetic 
three-vector potential, and ¢ is the electrostatic scalar potential. 
We have defined a quantity y as 


1 
E 


We notice from the form of (2.4) that the invariant Lagrangian £ is 
a sum of inner products of two four-vectors. It is straightforward to 
show that the inner product of two four-vectors is invariant under 
a Lorentz transformation. It follows that £ is Lorentz invariant, 
and has the same value in every uniformly moving reference frame. 
The proof of this is left to the reader in the problems. 


(2.6) 
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All relevant information about the magnetic and electric fields 
is contained in the magnetic vector potential A(x,t) and elec- 
trostatic scalar potential (x,t), respectively. In general they are 
functions of position x and time t, measured in the particular ref- 
erence frame of interest. The potentials arise from source currents 
and charges which are distributed in proximity to the charged par- 
ticle of interest. They also include the effects of magnetic materials 
and dielectrics. We assume in the following analysis that the po- 
tentials A(x,t) and $(x,t) are known. The reader is referred to a 
definitive text by Jackson [48], which describes how to calculate 
these potentials, given a known distribution of charges, currents, 
conductors, dielectrics, and magnetic materials. 


At this point we form a key postulate, namely, for physically allow- 
able motion of the particle, the integral of £ over time is stationary 
with respect to first-order variation as follows: 


o Ldr =0, (2.7) 


where 7 is the time measured in the rest frame of the particle, com- 
monly known as the proper time. We assume that the end times Ta 
and m, remain fixed with respect to the variation. This expression 
is also Lorentz invariant, because it is constructed wholly from 
Lorentz-invariant quantities. 


It is possible to construct a general covariant theory which de- 
scribes the motion in every reference frame. However, for our pur- 
pose here we are interested in the particle motion in a single ref- 
erence frame which is at rest relative to the laboratory, commonly 
known as the lab frame. It greatly simplifies the discussion if we 
confine our attention to this single frame. In the lab frame we can 
express (2.7) in the equivalent form 


ty 
af Ldt =0, (2.8) 


where we have defined L = £/y and t = yr as the Lagrangian 
and time, respectively, expressed in the lab frame. The time t is 
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related to the proper time 7 by a Lorentz transformation, where 
we assume the particle coordinate is zero in the particle rest frame. 
Substituting (2.5) into (2.4), it follows that 


L(x, v;t) = —me? 4/1 — v?/e + qv - A(x, t) —q (x,t) (2.9) 


in the lab frame. We have made use of the vector notation v- A 
to express the inner product of the two three-vectors v and A. In 
Cartesian coordinates this is v- A = vzAz + UyAy + UzAz. 


The Lagrangian L is a scalar function of the position x, and the 
velocity v. The time t is regarded as a parameter which uniquely 
specifies a point along the particle trajectory. The position and 
velocity depend implicitly on the time. Indeed, the central prob- 
lem is to solve for this dependence. In the case where the electric 
and magnetic fields vary with time, the electromagnetic potentials 
have explicit time dependence. For static fields, these potentials 
have no explicit time dependence. The Lagrangian therefore has 
no explicit time dependence in this case. 


The integral in (2.8) can be abbreviated as 
ty 
J= f L(x, v: t) dt. (2.10) 
ta 


It is a scalar quantity with units of energy times time, or action. 
The integral Sap is therefore known as the action integral. The ex- 
pression (2.8) says that the action integral has an extremum for 
the physically allowable trajectory. This trajectory exists among 
many hypothetical trajectories, each displaced infinitesimally from 
the physical trajectory. The expression (2.8) is known as Hamil- 
ton’s principle of least action. 


Forming a Taylor expansion of the variation (2.7) in the lab frame, 
and retaining only terms to first order in 6x7; and ĝv;, we find 


to aL 
D 5) dt. 
§ [bat = i (+ dnit 5 bv w) dt (2.11) 


& j=l 
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Making use of the chain rule for partial derivatives, we have 


d (OL d (OL OL 
a & x) = 62; a (5) Du, Ov;, (2.12) 


where ôv; = (d/dt)dx;. It follows (2.11, 2.12) that 


to Tee hs OL d OL 

§ [Lat = Dlia] +X [ee n (2 s) dt = 0. 
(2.13) 

Now ot, = dt, = 0, because the end times ta and t, are assumed to 

be fixed. This in turn demands ôx; = v; ôt = 0 at the end times ta 

and tp. Consequently, the first term on the right of (2.13) is zero. 

Since 6x; inside the integral is arbitrary, it is a necessary condition 

that 


ðL döL 0 
Ox, dt du, ” 
where? = 1, ..., 3. This is a set of three coupled equations, known 
as the Euler-Lagrange equations of motion. Given the Lagrangian 
(2.9) and the initial conditions for position x; and velocity v; at 
time zero, these equations can be solved in principle for the com- 
ponents x; and v; as functions of time. This represents a solution 
to the central dynamical problem for a single particle. We will in- 
vestigate the solution in more detail in the coming sections. 


(2.14) 


It is straightforward to show (2.9, 2.14) that 


© (ymv) = q(B+v xB) (2.15) 


where we have defined the three-vector electric and magnetic fields, 
respectively, as 


OA 
re 


B = VxA, (2.16) 


and we have made use of the total time derivative 


d O 


a= oe (2.17) 
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The proof of (2.15) is left to the reader in the problems. 


We define the three-vector kinetic momentum p as 
p=ymv. (2.18) 


Equation (2.15) is an expression of Newton’s law of motion for a 
charged particle, where the left side is the time rate of change of 
the kinetic momentum, and the right side is known as the Lorentz 
force. 


In principle it is possible to calculate all of particle optics by solv- 
ing (2.15), for the position x and the velocity v as functions of 
time t, but further considerations will lead to a more detailed un- 
derstanding, and to greater computational efficiency. 


Problems 


1. An arbitrary four-vector A, = (Aj, A2, A3, A4) is defined in 
terms of its four components. For two reference frames in relative 
uniform motion with velocity v along the z-direction, the compo- 
nents of A, are related in the two frames by 


At = Ay 
A, = A 
A, = ¥(A3+iBA4) 
A, = ¥(-iBA3 + Aa), 
where y is given by (2.9) and 8 = v/c. This is known as a Lorentz 


transformation. Show that the inner product of any two four- 
vectors A, and B, satisfies 


4 4 
SA B= J ABa 
w= pal 


An inner product of two four-vectors is thus said to be invariant 
under a Lorentz transformation. 
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2. Derive the Lorentz force law (2.15) from the Euler-Lagrange 
equations of motion (2.14). 


2.1.2 The Hamiltonian function and energy 
conservation 


We define a new function H by 
H(x, P;t) DR vi — L(x, v; t), (2.19) 


where P is an arbitrary three-vector, whose meaning will become 
clear in the following. The scalar function H is derived from the 
Lagrangian L by a specific transformation called a Legendre trans- 
formation [12]. We form the total time derivative of H by invoking 
the chain rule, 


3 . . 
dH — dx; ðH a) _ ôH ox 


Were ax, dt | OP, dt at’ 


i=1 


recalling that v; = dx;/dt. From the definition (2.19) we obtain 
the identities 


dH aL ƏH ðL dH ƏL 
ot One gp bu. ne w E 
(2.21) 


The third of these, together with (2.14) leads to 


OL dP; 
i ee 2.22 


It follows that the large parenthesis in (2.20) vanishes identically, 


ans dH OL 
Ge eae 2.2 
dt Ot ( 3) 


The function H is called the Hamiltonian functon, and the three- 
vector P is called the canonical momentum. From (2.9) and the 
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third identity (2.21), the canonical momentum components can be 
written as 


P; = ymv; + qÅ;, U= E OF (2.24) 
Equivalently, from (2.18), this is 
P; = pi + q Ai- (2.25) 


The canonical momentum is thus the sum of the kinetic momen- 
tum plus the charge times the magnetic vector potential. Obvi- 
ously, the canonical momentum and kinetic momentum are iden- 
tical in the case where the magnetic vector potential is zero. 


Next we consider the special case where the potentials A and @ 
have no explicit time dependence; i.e., the fields are static. From 
(2.9) it follows that the right side of (2.23) vanishes, and 


Pe iy: (2.26) 


This means that H is a conserved quantity in this case. From (2.9, 
2.19, 2.24, 2.26) it follows that 


H = ym + qd = const, (2.27) 


and H is a constant of the motion. We will see in the following 
that H can be identified with the total energy. 


The energy H does not depend on the magnetic vector potential 
A, because the magnetic Lorentz force in (2.15) acts in a direction 
perpendicular to the particle velocity v. As a result, the magnetic 
force alters the direction of the velocity v, but not the magnitude. 
Consequently, the magnetic force cannot cause a change in energy. 


We now proceed to define two quantities which will prove very 
useful later. We define a quantity EF by 
E = me’, (2.28) 


where mc? is the rest energy. We further define the kinetic energy 
T by 
E=T+me. (2.29) 
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By this definition, the energy E is the sum of the kinetic energy 
plus the rest energy. The Hamiltonian H is then 


H=T+4+ me? + qd. (2.30) 


The Hamiltonian is the sum of the kinetic energy plus the rest en- 
ergy plus the potential energy. The Hamiltonian H thus represents 
the total energy. It is conserved in the case where the potentials @ 
and A have no explicit time dependence. Any force which acts in 
such a way that the total energy is constant is called a conserva- 
tive force. 


Problems 


1. Show from the above analysis that 
P? = petm, (2.31) 
where p? = p- p. 


2. Prove the identity 
pe = BPE, (2.32) 


where 8 = v/c. 


2.1.3 Mechanical analog of Fermat’s principle 


We now concentrate on the important special case where the elec- 
tric and magnetic fields are constant in time. Mathematically, this 
is equivalent to the potentials A (x,t) = A(x) and ¢(x,t) = (x) 
having no explicit time dependence. We showed in the preceding 
section that the Hamiltonian represents the conserved total energy 
in this case (2.27). We now define a quantity Was as the compo- 
nent of the canonical momentum P along the trajectory path, 
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integrated over the path between the two endpoints x, and x». It 
is given by 
Xp 
Ww= | P-ds, (2.33) 
where the integration path is assumed to be the path of physically 
allowable particle motion, satisfying (2.14, 2.27). Equivalently, 


Wav = i “(oR w) dt. (2.34) 


The function Wa» is the integral of the action along the path. It 
is also known as the eikonal function. From (2.19), 


t t 
Wa= | (L+ H)dt= ’ Ldt + H (te — ta), (2.35) 
ta ta 

where, in the rightmost equality, we only consider possible motion 
for which H = const. The variation is 


t 

Wa= 5 f °” Ldt + H (ôt, — dt). (2.36) 
This variation is shown schematically in Figure 2.1, where the solid 
curve represents the physically allowable path, and the broken 
curve represents an infinitesimally displaced path, which is not 
physically allowable. The endpoints are held fixed by assumption 
in the variation. In order that H = const, it is necessary to allow 
the end times ta and t, to vary. This is different from Hamilton’s 
principle (2.7), where the end times ta and ty are assumed to be 
fixed. Consequently, in the present case, 


A a a t 
a par hiu E Ldt 
A ( Bt, a) k 


eA (OL. ab 
+f D(a F su) dt (2.37) 


where the first term on the right accounts for the variation of the 
end times ta and tp, and the second term accounts for the variation 
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Figure 2.1: Variation of the particle trajectory for fixed endpoints 


of the integrand. The integrand of the second term on the right 
can be rewritten (2.14) as 


OL gg E = bn, A OL pee. jer oL 
oxi a Ov; ama dt \ Ov; | Ov; dt ae dt \ Ov; i 
(2.38) 


From the third identity of (2.21) together with (2.37, 2.38), 


ty ty 3 ty 
ô Lat =| Lot| ES [Po (2.39) 

ta ta i=1 ta 
We now impose the condition that the endpoints x, and x, remain 
fixed. To ensure this, we require that dx; = —v; dt at the end times 
t, and ty, to compensate for what would otherwise be an offset of 
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the endpoints. From (2.36, 2.39) 


3 te 

OWa = |(-Z ruren) se = 0, (2.40) 
i=l ta 

where the large parenthesis vanishes identically from (2.19). This 

is the principle of least action for the special case where the po- 

tentials A(x) and $(x) contain no explicit time dependence. This 

can also be written (2.33) as 


5/ P-ds,=0 (2.41) 
where the endpoints x, and x, are assumed to be fixed. The in- 
tegral has units of action. The equation (2.41) can be regarded 
as the principle of least action for the case where the potentials 
have no explicit time dependence. We have shown that this is a 
necessary condition for physically allowable motion. 


We define a scalar quantity n as the component of canonical mo- 
mentum along the path of motion (2.25): 


n=P-8=p+qA‘°8, (2.42) 


where s is the unit vector along the direction of motion, locally 
tangent to the trajectory, and p is the scalar kinetic momentum. 
From (2.41, 2.42), the principle of least action can also be written 
as 


af” nds = 0. (2.43) 


A close analogy exists with light optics. Fermat’s principle states 
that light propagates along that path which minimizes the transit 
time between two points. This can be written as a variational 
principle as follows: 

ty 

ô| dt=0. (2.44) 

ta 
The speed of light is path length traversed per unit time, or ds/dt, 
where ds is the element of path length. From the Maxwell theory, 
an electromagnetic wave travels with phase velocity v given by 

c 


=n 


: (2.45) 
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where n’ is the index of refraction of the medium, and n’ = 1 in 
vacuum. Substituting, 


öf 'n'ds =0. (2.46) 
The physical path taken by the light is that path for which in- 
tegral in (2.46) is a minimum. The equations (2.43) and (2.46) 
are formally identical, expressing a close analogy between light 
propagation and particle propagation. The index of refraction n’ 
for light varies in general with position within the medium. The 
quantity n in (2.42) is identified with an index of refraction for a 
particle. It depends on the electrostatic potential ¢(x) through the 
momentum p, and depends on the magnetic vector potential A(x) 
explicitly. The electromagnetic potential varies slowly in space, as 
governed by Maxwell’s equations. 


Formulation of the dynamical problem in this way has the ad- 
vantage that it does not rely on time as an explicit parameter, as 
long as the potentials are time independent. This greatly simpli- 
fies the discussion of geometrical optics for this important class of 
problems. For example, in many particle beam instruments we are 
only interested in where the ray ends, but not in the time at which 
the particle arrives. 


In the following sections, we will make use of the variational prin- 
ciple (2.43) to solve for the detailed physical trajectory. 


2.2 Exact trajectory equation for a sin- 
gle particle 
We now make use of the preceding analysis to find an explicit dif- 


ferential equation governing particle motion for time independent 
potentials. The following analysis closely follows that of Sturrock 
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[86]. We seek a condition, based on the principle of least action 
for time independent potentials, which will allow a solution for the 
position x at all points along a physical trajectory. Expanding the 
variation (2.43) we have 


Wa = Ô ‘nds = | Í [ntn ts] ds, (2.47) 
where we assume the endpoints x, and x, remain fixed. The first 
term in the square bracket is the variation of the refractive index, 
and the second term is the variation of the path of integration. We 
assume for now that this applies to an arbitrary path, not neces- 
sarily a physically allowable trajectory. 


Expanding the differential path length ds in terms of the posi- 
tion dx, we find 


(ds)? = dx dx. (2.48) 
Taking the differential of both sides 
(ds) ô(ds) = dx - d(dx). (2.49) 
The unit vector s along the path can be written as 
dx 
epee 2.50 
s= 7, (2.50) 


Interchanging the order of differentials in (2.49), it follows (2.50) 
that 


d i nih 
from which 
gf Oy a elie. od 
ôS = ô (5) Fe (dx) —$ Í Lao] : (2.52) 
Expanding the variation ôn, 
ôn = Vyn - ÔX + Van - Ô$, (2.53) 


where (2.42) 
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We obtain an expression (2.42, 2.52, 2.53, 2.54) for the integrand 
in (2.48) as 


d d 
ðn +n 75 (08) = (Vn) -ôx +P. q; Ox): (2.55) 
The chain rule gives 
d d dP 


from which it follows (2.47, 2.55, 2.56) 


x 


Wa = [Px] + fox: (Van — FP) a (2.57) 


Xa Xa ds 


We now invoke the principle of least action (2.43), namely, bW.» = 
0. The first term on the right is zero, as the endpoints are assumed 
to be fixed, i.e., dx, = 6x, = 0. As 6x under the integral on the 
right is arbitrary, it becomes a necessary condition that the large 
parenthesis in (2.57) must vanish, i.e., 


dP _ 
ds 
This represents the exact trajectory equation, relativistically cor- 
rect in the lab frame, where we recall (2.18, 2.25, 2.42). For spec- 
ified endpoints x, and xy, this equation can be solved in principle 
to find the position x everywhere along a single trajectory of a 
single particle. 


Van — 0. (2.58) 


2.3 Conservation laws 


We showed previously that, in the case where the potentials A(x) 
and ¢(x) have no explicit time dependence, the total energy H is 
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a constant of the motion. In this section, we show that other in- 
variant quantities exist, as a direct consequence of the least action 
principle. As in the preceding section, the reader is referred to the 
book by Sturrock [86] for a detailed discussion. 


2.3.1 The Lagrange invariant 


In the preceding sections, we derived the necessary conditions for a 
single trajectory to represent physically allowable motion. Hence- 
forth we refer to a physically allowable trajectory satisfying (2.43, 
2.58) as a ray. In this section, we consider the behavior of rays 
which are infinitesimally displaced from one another. This is shown 
schematically in Figure 2.2. From (2.57) the variation in optical 
path length between two neighboring rays is given by 


OW ab = P, i OX) E P, 2 OXq- (2.59) 


This infinitesimal quantity is nonzero in general, since the end- 
points x, and x, of the two rays are assumed in general to be 
displaced from one another. It can be shown that dW,» is an 
exact differential |72], in which case 


P, = Vx, Wob, P, = —Vx,Wao- (2.60) 


Geometrically, this means that the canonical momentum P is nor- 
mal to surfaces of constant optical path length, Wa, = const at 
the endpoints, where we note that the endpoints can be chosen to 
be anywhere along the ray path. 


We now consider a second perturbation, independent from the 
first. It follows that (2.59): 
d(dW 4») = dP, è OXp + P, ; d(dX») = dP, G ÔXa = P; : d(dxXq). (2.61) 


Interchanging the order of perturbations and subtracting, we ob- 
tain 


dP, : ÔXa = OP, : dXa = dP, F OXp = ôP, : dX». (2.62) 
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Figure 2.2: Two rays, infinitesimally displaced from one another. 


Since x, and x, can be any two points connected by a ray, it follows 
that 
dP - ôx — ôP - dx = const, (2.63) 


where the 6- and d-variations refer to two separate rays, each de- 
rived by an independent perturbation from the original ray. This 
quantity is known as the Lagrange invariant. To appreciate the 
meaning of the Lagrange invariant, we consider a special case for 
which dx, = ôx, = 0. In this case (2.62) reduces to 


—doP, s dXa = dP, $ ÔXp. (2.64) 


This is shown schematically in Figure 2.3 , where the unperturbed 
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Figure 2.3: Perturbed rays for case dx, = 0x, = 0. 


ray is represented by a straight line connecting the beginning point 
a and the endpoint b. We now choose local z-axes codirectional 
with P, and P, at the respective endpoints x, and x,. We further 
choose 6x, colinear (either codirectional or antidirectional) with 
dP, and dx, colinear with dP», in the respective transverse end 
planes. Since dP, is perpendicular to P,, this represents a change 
in direction, but not magnitude of P,. A similar statement holds 
for P,. The Lagrange invariant (2.62) reduces to 


—dP, dTa = dP, OLp. (2.65) 


We notice (2.25) that ôP, = pad, and dP, = p d0, since the 
magnetic vector potential A is assumed to be unchanged in the 
perturbation. Recalling that p is the scalar kinetic momentum, it 
follows that 

—Pa Ôba dLa = py Ay OX, (2.66) 


where dx, is proportional to d@,, and 64, is proportional to za. 
Repeating this variation process in the orthogonal transverse axis, 
and multiplying, 


PŽ Qa dAg = p} dOa Ap, (2.67) 


where dQ = d0, d0, is the solid angle element, and dA = dz dy is 
the transverse area element. 
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Figure 2.4: Perturbed rays for case 0x, = X, = 0. 


Next we consider the special case where 6x, = 0x, = 0. This is 
shown schematically in Figure 2.4, where, again, the unperturbed 
ray is represented by a straight line connecting the beginning point 
a and the endpoint b. The two neighboring rays emanating from a 
single point, with infinitesimally differing directions intersect the 
same endpoint. In this case the endpoints x, and x, are said to 
be optically conjugate. Because 0x, = 6x, = 0, it follows directly 
from (2.59) that Wap = 0. This means that the two rays have 
identical optical path length Wa». This is equivalent to the state- 
ment that dW,» is a perfect differential, since the line integral of 
Wap around the closed path of the two rays is zero. 


The Lagrange invariant (2.62) reduces in this case to 

dP, : 0Xq = dP; - Xp. (2.68) 
Applying the preceding method, 

Pa dOa La = Pp dhr OXp. (2.69) 


This is known as the law of Helmholtz-Lagrange [86]. We define 
the magnification M = ôx,/Ô£a, in which case the angular mag- 
nification is given by d0,/d60, = pa/(Mppy). Repeating this in the 
orthogonal axis as before, it follows that 


p2 dQa 6 Aq = pè dQ ÔA. (2.70) 


The product of transverse area element times solid angle element 
is called the emittance. Equation (2.70) shows that the product 
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of the emittance times the square of the momentum is conserved. 
For a ray bundle, the current divided by the emittance is called 
the brightness. It follows from (2.67, 2.70) that the ratio of the 
brightness 8 divided by the square of the momentum is conserved, 
assuming constant current. This can be written as 


iz = const, (2.71) 
P 
where p is the relativistic scalar kinetic momentum. It does not re- 
quire that the two end planes be optically conjugate, as it applies 
to both of the above special cases. It follows that it is impossible 
to focus any beam to a spot which is brighter than the source. 
These arguments apply strictly only over infinitesimal regions. It 
is common practice to apply brightness conservation to a finite 
region, such as a whole beam. This is only approximate, however, 
and becomes less accurate as the whole beam becomes larger. 


Next it is interesting to consider the special case where P, is in- 
clined by an angle 0 to local z-axis. The above case becomes 


Pa COS Ôa dO, = M py cos Or dOh. (2.72) 


To this point, we have considered only infinitesimal perturbations 
of first order. It is interesting to consider the case in which rays 
inclined at finite angle 0 intersect the same image point x, for all 
0. This corresponds to perfect imaging, without aberration. We 
integrate as follows: 


ba o 
Pa ) cos ła da = Mp, 1 ; cos Oy dy, 
0 0 
pasinĝa = M pysin ds, (2.73) 


for all 0, and @. This is presumed true independent of 6x,, in 
which case it represents perfect imaging with regard to all aber- 
rations which are linear in x,; i.e., coma. This is known as the 
Abbe-Helmholtz sine condition for coma-free imaging [86]. 
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An analogous case exists where we assume 6x, to be parallel with 
the axis, and P, inclined at angle 0a. We find that 


Pa sin Oa dôa = M, Po sin Or dy, (2.74) 


where Mr = 60z,/62, is defined as the longitudinal magnification. 
Assuming perfect imaging as before, it follows that 


Oa Os 
Pa J sin Oa dôa = M, Pb I sin Os dô, 
0 0 
pasin? (0a/2) = Mz pisin? (65/2), (2.75) 


for all 0a and 6). This is known as Herschel’s condition for vanishing 
spherical aberration [86]. It follows from the preceding that the 
longitudinal and transverse magnifications are related by 


Mr = M? Pr/ pa. (2.76) 


By successive applications of the Legendre transformation, it 
is possible to construct other characteristic functions from 
W (Xa, Xp). For example, let 


V (Xa, Pb) = P, ‘Xp — W (Xa, Xp). (2.77) 


It follows that 
ôV = Pa Š ÔXa + Xp OP». (2.78) 


Continuing this procedure, we define 
X (Pa, Xb) = Pao Xa + Vika; Xo). (2.79) 


It follows that 
x= Xa ` oP, + Pi $ OXp. (2.80) 


Similarly we define 
Y (Pa, Po) = — Pa: Xa + V (Xa, Po). (2.81) 


It follows that 
6Y = — Xq° OP, + Xp° OP. (2.82) 
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The functions V, W, X, and Y represent a way to describe the op- 
tical coupling between the space (Xa, Pa) and the space (x+, Py) 
in an infinitesimal region surrounding a ray. 


Problem 


Show that the functions V, W, X, and Y all lead to the same La- 
grange invariant. 


2.3.2 Liouville’s theorem and brightness con- 
servation 


The motion of a particle can be considered to trace out a trajectory 
in a six-dimensional space, for which the coordinates are labeled 
by the three position components of x and the three canonical mo- 
mentum components of P. This is called phase space. The reader 
is referred to Goldstein et. al. [35] for background and further 
details. 


To introduce this description, we notice (2.14, 2.19) that 


H OL d (OL dP; 
. =— =— enn (2.83) 
Ox; On: dt Ov; dt 
and (2.19) that 
OH aa; 
eee 2.84 
OP, 7 dt’ ee 
for j = 1,...,3. Summarizing, this yields a coupled set of six 
first-order equations as follows: 
P; H ; 
OH = dP; OH — dx; (2.85) 


Se ae OP, dt’ 
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where j = 1,...,3. These are known as Hamilton’s equations of 
motion. Given the Hamiltonian function (2.9, 2.19) together with 
an initial condition (xg, Po) at any single phase space point along 
the trajectory, Hamilton’s equations can be solved in principle to 
find the entire phase space trajectory of a single particle. 


We imagine a family of trajectories, all infinitesimally displaced 
from one another, with each corresponding to a slightly differ- 
ent initial condition. These trajectories cannot intersect in phase 
space, as to do so would imply that a single initial condition would 
give rise to multiple end conditions. As such, an analogy exists with 
fluid flow, where the trajectories can be described by a flux j and 
a density p of points in phase space. As trajectories are conserved, 
these quantities obey a continuity equation 


Op 
-‘j+t—-—-0 2.86 
Vidi a e, (2.86) 
where 
i=pv, (2.87) 


and v is the six-dimensional velocity. Expanding the six- 
divergence, 


veS [oi + ap] 


j=l J 
ðt;  ƏP;\ _ (3p dx;  ðp dP; 
Ox; OP; Ox; dt OP; dt i 


-$f 


j=l 


(2.88) 


where the dot signifies total time derivative. The first term on the 
right vanishes by Hamilton’s equations (2.85). It follows that 


. Op 3 { Op dz; | Op dP; Op 
. — = i i i 2.89 
M ee (2 dt oR at ae) 


j=l 


where we recognize the right side as the total time derivative dp/dt. 
From (2.86, 2.89) it follows that 


dp 
— — 0, 2.90 
eS (2.90) 
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This is called Liouville’s theorem. It means that p = const, and 
the density of trajectory points in phase space is conserved. 


Applying this to a beam, the geometry is shown schematically 
in Figure 2.5. We imagine particles emitted from an infinitesimal 


P 


—— ~ >» vcosh dt |«—__—_. 


Figure 2.5: Geometry for brightness conservation. 


area element dA into an infinitesimal solid angle element dQ cen- 
tered around the kinetic momentum vector p. The phase space 
density is given locally by 

dêÊN 


p(x, P) = a d3P = const, (2.91) 


where 


dx =v cosbdtdA, ČP = p dp dQ, (2.92) 


and the scalar kinetic momentum p is related to the velocity v 
by (2.18). Passing to the limit of an infinitesimally thin volume 
element in the z-axis, the density p is a delta function in z. Inte- 
grating over all z, the result is unity, by the property of the delta 
function. It follows that 


1 dN 


Soe : 2. 
ap dAdo const (2.93) 
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We consider the special case of a monoenergetic beam with single 
value p. In this case the density is a delta function in p. Integrating 
over all p, we obtain 


1 dN 
p2 JA dQ = const. (2.94) 


We define the brightness 8 as the density of trajectories per unit 
transverse area per unit solid angle, 


i = const. (2.95) 

p 
The ratio of brightness to square of the relativistic kinetic momen- 
tum is conserved. This reproduces the result (2.71) found above. 


Solving (2.29) for the scalar kinetic momentum p in terms of the 
kinetic energy T, 


T? 
pP = 2m | T + ~—~ | = 2meV*, (2.96) 
2mc? 


where we have defined a quantity V*, referred to by many authors 
as the relativistic beam voltage, in which case 


ce const. (2.97) 


* 


As a result of this, it follows that a beam can never be focused 
to a spot which is brighter than the source. This has the practical 
consequence that the source brightness represents a fundamentally 
important property of any optical system. 
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2.4 General curvilinear axis 


For many systems, it is convenient to formulate the optics in terms 
of transverse coordinates in a plane which is locally perpendicular 
to a central optic axis. As this axis need not be a straight line, 
we designate it a general curvilinear axis. We designate an axial 
coordinate z, and transverse Cartesian coordinates x; = (x,y) for 
j = (1,2) in a plane locally perpendicular to the axis. We further 
designate ray slope components x} = (x', y’) = dx;/dz. A ray is 
completely specified at any plane z by its two-vector transverse 
position x and its two-vector slope x’. 


The central problem in this formulation may be stated as fol- 
lows: given the transverse position x, and slope x’, at an arbitrary 
starting axial coordinate Za, find the transverse position x, and 
slope xj, at an arbitrary ending axial coordinate z. This is shown 
schematically in Figure 2.6. It is implicit here and in the following 


Figure 2.6: General curvilinear axis. 


that the slope xj, be finite. This excludes the case of a particle 
mirror, for which the slope is infinite where the ray turns around. 
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In fact, both the position and slope are considered small in the 
following. Equivalently, we will only investigate rays that remain 
close to the central axis. 


Our purpose here is to identify the equations of motion, and de- 
scribe a general methodology for solving them. This solution can 
later be applied to a large variety of specific cases, describing a 
similar variety of phenomena observed in practice. The reader is 
referred to the references by Sturrock [86], Rose [75], Hawkes and 
Kasper [43, 44, 45], and Wollnik [93] for further detail and elabo- 
ration. The present analysis is based on the earlier works of Glaser 
[33] and Sturrock [86]. 


2.4.1 Equation of motion in terms of trans- 
verse coordinates and slopes 


We found previously that the optical path length along a ray join- 
ing two endpoints x, and x, is given by the action integral (2.33) 
as om Pi 

Wap a nds z m dz, (2.98) 


Xa 


where we have defined a modified refractive index m as 


d 
m(x, x’; z)=n = =n 1 +r? + y?, (2.99) 
ž 


where x and x’ are the two-dimensional vector position and slope 
components in the transverse plane, respectively, and where the 
prime represents differentiation with respect to z. The variation of 
optical path length is given by 


SW af” mdz = f ëm) dz =0. (2.100) 


a 
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We have purposely excluded the second term in square brackets 
of (2.47). This is equivalent to assuming the variation of the path 
length along the optic axis is zero. In order for this to be mean- 
ingful, the optic axis must itself be a physical ray in the sense of 
satisfying (2.58). 


Expanding the variation ôm, 


2 
m= Y & 5x, + om se) (2.101) 
3I 


j=1 \OTj 


Using the chain rule, we find that 


d {Om d {Om om 
ad fom, | Sr. mr A 2.102 
dz & on; a dz (32) Ox", i ( ) 


This leads to 


2 
Wa = $ a 


/ 
ja OX} 


Zb 2 
Zo Om dom 
dx; | —— — —~— | dz =0. 
7 i n 3 k & dz me) á 
(2.103) 
Assuming the endpoints are fixed, dz; = 0 at Za and zp, the square 


bracket vanishes. Furthermore, since dz; under the integral is ar- 
bitrary, the large parenthesis must vanish, and 


om (3) =0 (2.104) 
OL: 


J 


for 7 = 1,2. This represents a coupled pair of Euler-Lagrange 
equations. They are the exact ray equations for a single particle in 
the case of a general curvilinear axis. They can be solved in prin- 
ciple for the transverse position x and the transverse component 
of the ray slope x’ in terms of the axial coordinate z. This is a 
necessary condition for a path of physically allowable motion; i.e., 
for the path to be a ray. 


The choice of coordinates x; and slopes x; remains arbitrary. 
For example, one could choose Cartesian coordinates x(z) and 
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y(z) in the local transverse plane. Alternatively, one could treat 
the local transverse plane as the complex plane with coordinates 
u(z) = x(z) + iy(z) and u(z) = x(z) — iy(z). Alternatively, one 
could choose polar coordinates r(z) and 6(z) in the local trans- 
verse plane. The best choice is the one which allows one to express 
the problem in the simplest possible way. 


We can write 


dx dy dz 
| l P, 
ds ” ds ds 


dz 
= (P x + P y +P.) T 
ds i ; 
m = n met + Piy + P, (2.105) 
z 
from which it follows ə 
m 

— > =P. 2.106 
Ox; ; ( ) 


for j = 1,2, where P, and P, are the transverse components of 
canonical momentum. The Euler-Lagrange equations can therefore 
be written as 


nue eG (2.107) 


in analogy with (2.58). Considering two rays which are infinitesi- 
mally displaced from one another, the differential in optical path 
between the rays is (2.103, 2.106) 
2 
OWab = So (Py OX; = Paj ÔTaj). (2.108) 
j=1 
In general dW,» is non-zero, since the endpoints zaj and zaj can be 
independently displaced between the two rays by 6x,; and Tej, 
respectively. 


Since 6W,, is an exact differential, it follows that 


= OW ab OW ab 


Pe Py = 


(2.109) 


Oxy; ’ OLe; 
j j 
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Physically, this means that the transverse component of canoni- 
cal momentum is perpendicular to the contour lines of constant 
optical path Wa» in any transverse plane. Again, the coordinates 
x;(z) and slopes x/(z) in (2.104) should be regarded as completely 
general. In the following sections it will prove expedient to move 
freely between alternative coordinate systems. 


2.4.2 Natural units 


The discussion in the following few sections will be somewhat sim- 
plified by expressing the variables in alternative units, which are 
derived from SI units. The scalar kinetic momentum p can be writ- 
ten as 

poe sO. (2.110) 

mc 

Since the total energy needs only to be expressed to within an arbi- 
trary, additive constant, we are free to define the zero of potential 
energy. Let 


T+q¢=0, (2.111) 


where T is the kinetic energy, and where q = —e for the electron 
charge. This is consistent with energy conservation in the case 
where the electromagnetic potentials have no explicit time depen- 
dence. Physically, the zero of potential energy is here defined at a 
position where the particle has zero kinetic energy, i.e., is at rest. 
This position might coincide with the emission surface, but this 
need not necessarily be the case. The quantity thus represents 
both the electrostatic potential, and the kinetic energy of the par- 
ticle, only for this particular choice of the zero of potential energy. 
Many workers call ¢ the beam voltage at any given position in the 
optical system. We define a dimensionless quantity 


e LT | (2.112) 
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The magnetic vector potential can be written in dimensionless 
form as 


Ae ey (2.113) 


The velocity, space charge density, and space current density can 
be written as 
v q P 


A x A q ; 5 AON 
V= zr pra) J = —— Hoj, J=PY; 
c MC? € mc 


respectively, where p < 0, regardless of sign of charge q. 


The rest energy plus the kinetic energy is given in dimensionless 


units by 
y=lt+o=VJltp (2.115) 


where the rest energy mc? is unity in these units. Solving this for 
the scalar kinetic momentum p, we obtain (2.115) 


p= \2¢+ ¢, (2.116) 


where ¢ and p can be regarded as functions of the coordinates 
x; only. This is due to the fact that the zero of potential energy 
is fixed (2.111). In the nonrelativistic limit, the kinetic energy is 
small relative to the rest mass, as follows: 


o<1. (2.117) 


In the following discussion, we will not make this approximation, 
but rather retain the full relativistic form throughout. 


All quantities are dimensionless except coordinates and time, 
which retain their SI units of meters and seconds, respectively. 
One can easily return to SI units at any point in a calculation by 
inverting the above transformations. Many calculations seek posi- 
tion, such as the path of a ray, or the deviation of the path from its 
paraxial or Gaussian approximation. In such cases, it is not neces- 
sary to convert back to SI units for the result to be practical. We 
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therefore refer to these units as natural units. Unless specifically 
noted, we will use these units throughout the following section de- 
scribing the special case with axial symmetry, and drop the tilde. 


2.5 Axial symmetry 


Systems with a straight optic axis, where the potentials A and 
@ are axially symmetric, represent an important special case of 
the general curvilinear axis. This includes a large class of useful 
instruments, including electron and ion microscopes. It excludes 
curved-axis energy analyzers. The reader is referred to Rose [75], 
Hawkes and Kasper [43, 44], and Wollnik [93] for further detail 
and elaboration, both of axially symmetric and nonsymmetric sys- 
tems. 


2.5.1 Exact equations of motion for axially 
symmetric fields 


In the absence of space charge, the electrostatic potential ¢ satis- 
fies Laplace’s equation, 


V7o=0, (2.118) 
In cylindrical coordinates this becomes 
æ .. le o? 
f H = 0. 2.11 
(2 r Or Z) Ap ( =) 


We propose a series solution by the method of undetermined co- 
efficients [74]. We assume that ¢ can be expanded in a series rep- 
resentation given by 


olr, z) = ao(z) + alz) r? +.da(2) rf +... (2.120) 
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where we will now proceed to solve for the coefficients aj. The 
kinetic energy of a particle on axis is 


ao(z) = (0,2) = (2). (2.121) 
From (2.119, 2.120, 2.121) it follows that 
plr, z) =8-4O" 74+ LON rt +..., (2.122) 


where primes indicate differentiation with respect to z. Expanding 
the scalar kinetic momentum p we obtain (2.116, 2.122): 


p(r,z) = V2o+¢ 


— Fp o" (1+ 8)r? 
+[Ap to’ +a) - hpa] rot..., 
(2.123) 


where we have defined a quantity p(z) as the scalar kinetic mo- 
mentum on axis as follows (2.121, 2.122): 


p(z) = p(0, z) = V2 + ®?. (2.124) 


Separately, the magnetic field B is given in terms of the magnetic 
vector potential A as 


Bavxan8 24a (Bes 2). (2.125) 


Oz Or r 


In the absence of space current, Maxwell’s equation is 
VxB=0, (2.126) 


from which it follows that (2.125, 2.126) 
[E es ene Os 6 
—ô ( | ee ) o=0. (2.127) 


We assume a series representation for Ag as follows: 


Ag(r, z) = bi(z) r +.b3(z) r? + bs(z) r5 +..., (2.128) 
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where we now proceed to solve for the coefficients b;. We define a 
function B(z) as the magnetic field on axis, 


B(z) = B,(0, z) = 2b, (2.129) 


leading to 


Ao(r,z) = 4Br—- $ B" r’ + EBV rt... (2.130) 


2 


The modified refractive index m is written as 


je He 
dz 
ds 
= (p- P 
vg Ao ds 
= 1 12 2 —_ oe 
PUNEET ds/dt dz 
= pv1+r'2+r20'2— 16! Ag. (2.131) 


Euler-Lagrange equation for angular coordinate is (2.104, 2.131) 


ðm d Om 
aa ae = 2.132 
o0 dz 00" i ( ) 
where, because of axial symmetry, 
om 
ells 2.133 
70 (2.133) 
From (2.131) we obtain 
29! 
a 2 — rA = C = const, (2.134) 


3 14r 24 r20? 


where C is identified (2.106, 2.134) as the conserved canonical an- 
gular momentum. In the case where C = 0, the ray intersects the 
optic axis at some point. Such a ray has no angular momentum, 
and is called a meridional ray. In the case where C ¥ 0, the ray 
has angular momentum, and does not intersect the optic axis. Such 
a ray is called a skew ray, with C as a measure of skewness. 
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The Euler-Lagrange equation for the radial coordinate is (2.104) 
Om d Om 
— — — — =0. 2.135 
Or dz Or’ ( ) 
This leads (2.131, 2.135) to the exact ray equation for the radial 
coordinate in the case of axial symmetry as follows: 


Se cae z fae? 1/2 
4 kp ~ | p? — (C/r + Aa)? 


Ea (Coa) (S458) 


(2.136) 


d 


recalling that p is the scalar kinetic momentum (2.123), and Ag is 
the magnitude of the magnetic vector potential (2.130). Both p 
and Ag are assumed to be known functions of the coordinates. 


The differential equations (2.134, 2.136) are a coupled pair, for 
which the desired solutions r(z) and 6(z) are exact in principle. 
These equations were derived by Sturrock [86]. Because the equa- 
tions are nonlinear, the solutions r(z) and 6(z) cannot be expressed 
in a simple, closed form. Consequently, an analytical solution must 
rely on finding a suitable approximation. Alternatively, these equa- 
tions are amenable to exact numerical solution. 


2.5.2 Paraxial approximation, Gaussian optics 


Assuming 
lee ae C=0 (2.137) 
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in (2.136), and retaining only terms through order r, one obtains 
(2.123, 2.130, 2.136, 2.137) the approximation 


2+0 B? 
Or" + G7’ +1 6"r 4 = 0. 2.138 
tee E C. 


This is a linear second order equation for the radial position r(z) 
of a meridional ray. It is only accurate for rays close to the optic 
axis, and this approximation is therefore known as the paraxial ap- 
proximation. A purely electrostatic field has B = 0, and a purely 
magnetic field has 6 = const. 


This equation can be integrated in principle by first seeking an 
integrating factor. To this end we define a reduced ray [33, 71] 


R(z) = [®(z)]}** r(z). (2.139) 
Substituting this into (2.138), we obtain a reduced equation 

R"(z) + Q(z) R(z) = 0, (2.140) 
where we have defined a function Q(z) as 


3 (=) 1+8 B? 


Q(z) 


= i6 \o) 1462 ° 860462) ie 


The region where Q is non-zero constitutes a lens, completely anal- 
ogous to a lens in light optics, with the difference that the bound- 
aries for the focusing region are not sharply delineated. 


Since Q(z) is positive-definite, it follows that R"(z) < 0. The re- 
duced ray R(z) therefore always bends toward the optic axis. This 
is not necessarily true for the actual ray r(z), which can curve 
away from the axis within a region with electric field. 


We can define a forward focal length f+ for the lens, where rays 
enter parallel to the optic axis at radius r_., and exit with slope 


— =- where Poa =O: (2.142) 


oO 
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For many systems it is the case that the radial position of the 
ray is roughly constant within the lens field, i.e., R(z) ~ const. 
Such a lens is called a thin lens. It is also often the case that the 
electrostatic component of the focusing is weak or nonexistent; 
i.e., r! œ~ "4 R'. Such a lens is called a weak lens. Using these 
approximations, we obtain (2.139) 


1 PNE R 
7 -(% ) =, (2.143) 


From (2.140) we obtain 
R a R" dz = -f QRdz& -Ro | Qdz. (2.144) 
From (2.141, 2.144, 2.145) we obtain 


1 x)" po] 3 (PNV 140 | B? 
a ee he 16 (=) 1+8/2 86(1+ 6/2) 
(2.145) 
where the first term on the right represents the electrostatic focus- 
ing, and the second term represents the magnetic focusing. Simi- 


larly, we define a reverse focal length f_, where rays enter parallel 
to the optic axis at radius ræ, and exit with slope r^ x: 


Lh S 
a 
The axial positions of principal planes follow directly from f+} and 


f. 


dz, 


where ri =0. (2.146) 


The quantity 1/f represents the focal strength of a lens. In the 
purely electrostatic case where B = 0, the focal strength is pro- 
portional to the charge q, and independent of the mass m, taking 
account of the dimensionless units. In the purely magnetic case 
where ®’ = 0, the focal strength is proportional to the ratio of 
q/m. Consequently, it is more efficient to use electrostatic lenses 
for heavier particles, such as ions, and magnetic lenses for lighter 
particles, such as electrons. 
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2.5.3 Series solution for the general ray equa- 
tion 


We now seek a general solution to the exact ray equation (2.104). 
This must include all rays, including meridional and skew rays. 
Because the exact trajectory equations (2.134, 2.136) are nonlin- 
ear, they cannot be solved in closed form. Consequently, we seek 
an approximate solution by series expansion [33]. 


Recalling the modified refractive index for the general curvilinear 
axis, 


m=pvV14+r'24726'2 — 7! Ag, (2.147) 


where the scalar kinetic momentum p(r, z) is given by (2.123) and 
the magnetic vector potential Ag is given by (2.130). 


We define a complex transverse coordinate 
u=X+iY =re”, aw=X-iY=re, (2.148) 


where X(z) and Y(z) are Cartesian coordinates in a transverse 
plane at axial coordinate z. It follows that 


i (u'u — uu) = 2X Y'— X'Y) = 2776’, (2.149) 


and 


Vitr’? +r? = V1 +u =14+ pau iu u? +... 
(2.150) 
We can write a power series expansion for the refractive index, 
making use of the axial symmetry of the scalar kinetic momentum 
p(r,z) and the magnetic vector potential Ag(r, z) in (2.123, 2.130) 
as follows: 


m = mM+mMetmst... 
= p 
+[-}p 10"(1+)] au 


+ [pe] aw 
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+ ia B] i (u'u — tu’) 

SA |-2 p] uu? 

4 e po OM ®) | uuu’ 


+ (Apo (14+o)— hpo] a? 


+ È B” | i(u'u — uu’) wu 
cso (2.151) 


where the various orders of m are defined by the power of the co- 
ordinates and slope components. The quantities in square brackets 
depend only on the fields on axis, embodied in ®(z) and B(z). 


The paraxial term is given by (2.151) 
m2=—Fp "(14+ %)tu+s paw —+Bi(au—w’). (2.152) 


We now define the paraxial approximation by retaining only terms 
through order mə in (2.104, 2.151). The paraxial ray equation is 


then given by 
Ome d Ome 


— =0. 2.153 
ðu dz Ou’ ( ) 
Substituting (2.152) into (2.153) we obtain 
d A O;Ra a'a lfag pi 
7, Buje iB +5 |p D” (1+8) —iB'|u=0. (2.154) 


The imaginary terms correspond physically to a rotation of the 
bundle of rays about the optic axis as a function of the axial co- 
ordinate z. Physically, this arises from the Lorentz force (2.15), 
where the axial component of the magnetic field acts on the trans- 
verse component of the particle velocity. 


It is possible to rotate the coordinate system to compensate for 
this. We define a rotated complex coordinate v(z) = z(z) + iy(z) 
as follows: 

u(z) = v(z) 0, (2.155) 
where, by definition, 


-Č = 1 pB. (2.156) 
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The rotation angle is then given by 
ey as 
Xab TOD p Z. (2.157) 


This gives rise to the following useful transformations: 


uu = vv 
aul = b'v'+1p'Bi(a’v—tv'\+1p?B av 
2 4 
i(a’u—tu') = i(o'v—dv'\+p'Bov. (2.158) 


Substituting these into (2.151) we obtain the modified refractive 
index in the rotated coordinates as 


m = M+mMmetmst... 
= pP 
+ +p] o'v’ 
T tp @"(1+ 6) — }p7'B?| ou 
+ [dp to’ (1 +0) - 1 p64 Lp BB! 
-m P BY — gp p °O"(1 + @) BY] ov” 


Gale tp O"(1+ 6) - $ pB] üv’ v’ 
[io] oeo 
+|-4 pB’ - 4p" (1+0)B+ 4B" | 
. a v— Uv e 

+ -}B] i i (Uv — v’) I'v 
+ [2 p7B?] [i ('v—ov’)]? 

dhe ras (2.159) 


The paraxial term is given in the rotated system by 
mo = tpo'v' + |-} p70" (1+ 4) — tp B?| ov. (2.160) 
The paraxial approximation to (2.104) is then 


Oma B d mə 
ov dz ov 


=0. (2.161) 
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Substituting (2.160) into (2.161) we obtain the paraxial ray equa- 
tion in the rotated system as follows: 


d x z 
EEE FOTE v=0. (2.162) 
The absence of imaginary terms shows that the rotation has been 
removed. It is simpler to work in the rotated system than the un- 
rotated system, as the image has the same rotation as the object 
in the rotated coordinates. 


As a second-order linear differential equation, (2.162) has two lin- 
early independent solutions, which we denote g(z) and h(z). By 
substituting these in turn into (2.162) for v(z) and subtracting the 
two equations, it is straightforward to show that 


d j d ; 
h——(pg’)—g— (ph’) =0, (2.163) 
dz dz 
from which it follows that 
d 
— Nes gt = 0. 2.164 
q P (gh g'h)|=0 (2.164) 


The quantity in square brackets is, therefore, conserved. We denote 
this quantity as k, defined as 


k = p(z)[g(z) h"(z) — g'(z) A(z) ] = const. (2.165) 


The conserved quantity k is called the Wronskian. A more general 
expression for the Wronskian exists for a general curvilinear axis. 
The reader is referred to Rose [75] for details. It is closely related 
to the Lagrange invariant discussed earlier. 


In order to fully determine the solutions g(z) and h(z), it is neces- 


sary to specify boundary conditions. We choose these arbitrarily 
as 


h(zo) = 0, eat (2.166) 
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where Zo and z4 are the axial coordinates of the object and aper- 
ture planes, respectively. Given this, a general solution for the 
paraxial ray v(z) can be written as 


S 
— 
R 
| 


vo gz) + va h(z) 
vo g' (z) + v4 h'(z). (2.167) 


u'(z) 
Remembering v = x + iy in the rotated system, it follows that 


z;(z) > Lo; glz) + Lay h(z) 
T(z) = oj; 9'(z) +24; h'(2), (2.168) 


where x; = (x,y) for j = (1, 2). 


The choice of aperture plane z4 is arbitrary. Often one chooses 
the aperture plane to coincide with a physical aperture, but this 
need not be the case. The aperture plane cannot coincide with the 
object or image planes, as the solutions g and h would no longer 
be independent. 


Since the Wronskian is conserved, it retains the same value 
throughout the system, and 


k = po h'o = —Pa J'A = PrMh'r, (2.169) 


where the subscripts O, A, and I denote the object plane, aper- 
ture plane, and Gaussian image plane, respectively, and M is the 
magnification. 


In practice one usually determines the solutions g(z) and h(z) by 
solving (2.162) numerically. This requires knowing ®”(z) to high 
accuracy. Unfortunately this is not always possible, even if one 
knows ®(z) to high accuracy at discrete points along the axis, be- 
cause numerical differentiation introduces error. The dependence 
on the second-order derivative can be eliminated by defining a 
reduced ray w(z) as follows: 


v(2) = [p(2) 12 w(2). (2.170) 
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From (2.160, 2.170) we obtain 


me = i'w 
4 [-} p~? p" (1+8)+ : po 62 ae p)? = 1 p-?B?] ai 
—1p-? p (1+ 8) (ww + wu’). (2.171) 


From (2.104) we have 


Ome d Ome 
= = 0. 2.172 
ðw dz Ow' ( ) 

This yields the paraxial ray equation in reduced coordinates as 
follows: 
a 

= + [2p-46'7(1 + 6)? — 1p? 0’? + 1p? B’] w=0, (2.173) 
where the second-order derivative ®” has been successfully elim- 
inated. Although the reduced ray w(z) offers a practical simplifi- 


cation for obtaining the paraxial ray solution numerically, it offers 
no advantage for obtaining the aberrations. 


Axial symmetry permits only terms (2.151, 2.159) in m of the 
form 


Ray? X?+Y° 2(XY'-X'Y) 2(XX'+YY') 
uu w u i(u'u — uu’) wu + tu’ 

ge + y? r? ie y’? 2 (xy = a'y) 2 (xx Ea yy’) 
vv vu! i(v'v — bv’) v'u + bv’ 
ww w w’ i(w'w — ww’) ww + wu". 


This is shown by replacing x — y and y > —«x, for example, 
corresponding to a rotation of the coordinate system by +90 de- 
grees in the transverse plane. The reader can easily verify that 
all of the above products are invariant under all such 90 degree 
rotations. Exactly four independent degrees of freedom exist, cor- 
responding to two transverse coordinates and two transverse slope 
components. As a result, four independent products exist for each 
line of the above table. These facts represent the necessary and 
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sufficient conditions for axial symmetry. 


We have derived a prescription for obtaining the general solution 
in the paraxial approximation. The linearity of the paraxial ray 
equation ensures perfect imaging in this approximation. Departure 
from perfect imaging represents aberration. This will be treated 
in a later section using a perturbation approach. First, however, 
it is instructive to extend the preceding arguments to include the 
effects of space charge. This is the subject of the next section. 


2.5.4 Space charge 


In the classical limit, particles in the beam can be regarded as dis- 
crete, point charges. The particles propagate together, each with 
its own velocity. If the beam is sufficiently dense, the particles 
interact with one another via the Lorentz force (2.15). Every par- 
ticle produces an electrostatic field E by virtue of its charge, and 
a magnetic field B by virtue of its current. These fields, in turn, 
act on the other particles in the beam. 


A proper analysis in the classical limit treats the particles as dis- 
crete, and randomly distributed within the beam. This will be done 
in the later section on the stochastic interaction. A great deal of 
understanding can be gained by regarding the beam as a contin- 
uum of charge and current, however. We consider the effect of the 
fields generated on a test particle which moves with the beam. 
We imagine an axially symmetric, monoenergetic beam character- 
ized by space charge density p(r) and current density j(r). The 
geometry in the lab frame is shown schematically in Figure 2.7. 
Initially the beam is assumed to be parallel, in that the direction 
of the local current density vector throughout the beam cross sec- 
tion points everywhere along the z-axis at the left of the figure. 
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r(z) 


Figure 2.7: Space charge, quasi-parallel beam. 


Considering a cylinder of radius r, the electric field vector E points 
radially outward for a positively charged beam, and radially inward 
for a negatively charged beam. Gauss’s law can be written in SI 


units as j 


f B-dS = — f o(r) dV, (2.174) 


where the volume integral on the right is the total charge enclosed 
withn the cylinder. Expressing the elements of surface area and 
volume in cylindrical coordinates, this becomes 


E,(r) = a p(rı) rı dra. (2.175) 
€or JO 
The ends of the cylinder do not contribute, because the electric 
field vector is coplanar with the end faces. Any axially symmetric 
charge outside of the radius r does not contribute to the electric 
field. Separately, Ampere’s law can be written in SI units as 


[Ba = mo | jdA, (2.176) 


where the integral on the left is the line integral around a circular 
path of radius r, and the integral on the right is the area integral 
over a circular disk, which is oriented in the transverse plane. The 
integral on the right side is the total current enclosed within the 
cylinder. Expressing the elements of path length and transverse 
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area in polar coordinates, this becomes 
Lo f". 
Bo(r) = af j(ri) rı dra. (2.177) 


We now consider the Lorentz force (2.15) acting on a test particle 
of charge q at radius r which moves with the beam. This particle 
is depicted schematically by the small circle in Figure 2.7. The 
radial component of the Lorentz force can be written as 


d vV fr. 
ym a = q -5) j(ri) ri dri, (2.178) 


Egur 
where y is defined by (2.9), and we have made use of 
ilr) = p(r) v, (2.179) 


and 
po €o = 1/2. (2.180) 


The first term in large parentheses represents the outwardly di- 
rected electrostatic force arising from the space charge p, while 
the second term represents the inwardly directed magnetic force 
arising from the space current j. The relative strength of these two 
forces approaches equality in the extreme relativistic limit where 
v?/c? — 1. Physically, this occurs because the interaction time 
approaches zero in the lab frame. The presence of opposing elec- 
trostatic and magnetic forces can therefore be regarded as a purely 
relativistic effect. We now write 

dr dr dz > Pro a y 

— = — — = vf, — =v r, 2.181 

dt dz dt ( ) 
where primes denote differentiation with respect to the axial co- 
ordinate z. This leads immediately to 

” qm? f 


r(2)= are j(ri)ridni, (2.182) 


where p is the relativistic scalar kinetic momentum obeying (2.28, 
2.20). 
p= By me. (2.183) 
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The momentum p is constant to first order in this approximation. 
The momentum p is related to the kinetic energy T and relativistic 
beam voltage V* by (2.96). The equation (2.182) can be regarded 
as a general differential-integral equation for the trajectory r(z) 
for a quasi-parallel beam in the first-order (paraxial) continuum 
approximation. It is possible in principle to solve this equation for 
the trajectory r(z) of the test particle. This trajectory is shown 
schematically as the bold curve in Figure 2.7. This curve traces 
out the envelope of the expanding beam. 


In order to gain a further appreciation of the physical significance 
of this, we consider the special case where the current density 7 is 
constant within the volume of the beam. We can write j(r) = jo, 
independent of radius r. We further assume that the effect is weak, 
such that the expansion of the beam is small relative to the beam 
radius. In this case the equation (2.182) reduces to 


r"(z) = (=) r(z). (2.184) 


2 eop? 


The leading factor in large parentheses on the right side can be 
regarded as constant in this approximation. Physically, the left 
side represents the bending of the ray. This is proportional to the 
distance r off axis. This is precisely the condition for a perfect 
lens, with the result that defocusing occurs, but no blurring. This 
defocus can be corrected in principle, and has no net effect on the 
quality of the image. 


With this intuitive picture in mind, we now proceed to apply the 
methods of the preceding sections, taking the average space charge 
and space current into account. We return to dimensionless units 
(2.110 to 2.117) at this point. The electrostatic potential in the 
presence of space charge obeys Poisson’s equation, 


V*¢=—p. (2.185) 


We assume the space charge p = p(z) to be uniform in the trans- 
verse plane while varying in the axial direction. From (2.120, 2.121, 
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2.185), 


d(r,z) =B-F(P" + pr? +E(OV + p")r*+.... (2.186) 


The scalar kinetic momentum is (2.116, 2.186) 
p = paip (®"+p)(1+®)r? 


+4 p~! (aY +p") (1+ )r*— +p” (D +p)? ri +... 
(2.187) 


In addition to the space charge, a beam in general has a global 
(averaged) space current j. Maxwell ’s equation is 


VxB=Vx(VxA)=j. (2.188) 


In the lab frame, the beam current is primarily along the beam 
axis, with the transverse component relatively small. We there- 
fore neglect the transverse components of j and A. It follows that 
(2.188): 

07A, 190A, 

ðr? r ôr’ 
The solution for the axial component of the magnetic vector po- 
tential A, arising from the space current is 


(2.189) 


Jz = 


E =-1 T (2.190) 
where the space current and space charge are related by 


(2.191) 


ee AP 
je = Us = Fe 


Using the earlier approach, we obtain the modified refractive index 
(2.147) as follows: 


m=pvV1+r'2+r20'2 — r0’ Ag — Ay. (2.192) 


The expression of (2.159) with space charge terms added to (2.192) 
gives the result 


m = M+tmMetmst+... 
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= p 

+|4p| vy! 

aÈ fal. =<]: p” i P 1 l ® _ 1 1p? = 
| ae mrap) tsp ý 

+A (® o"a) hpo 
+35 PBB” — 35 p E 
-4 p3(®" + p) (1+ 0) B?] av 

+[-i p (E” +p) (1+) — 35 p1B] ovo"! 

1 12,12 
+ —}p| VU VU 
+[—2,p°B* — 4 p-%(6" +p) (1+ 9)B+ 3B") 


+[-4p 1B? [i (a/v —a0’)P 


Son (2.193) 


We note that the paraxial space charge term (large parentheses) 
tends to zero in the extreme relativistic limit ® >> 1. Physically, 
this occurs because the space current causes magnetic compression 
of the beam, due to parallel current elements. This purely rela- 
tivistic effect offsets the Coulomb repulsion of the space charge. 
Equivalently, the interaction time approaches zero in the lab frame, 
causing the space charge interaction to approach zero as well. 


2.5.5 The primary geometrical aberrations 


We showed previously that the exact ray equation for the case 
of axial symmetry cannot be solved analytically in closed form. 
Consequently, we adopted a series solution. The paraxial approxi- 
mation leads to a linear, second order differential equation for the 
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ray, which can be solved in principle, but is only accurate for rays 
near the optic axis. The next step is to solve for the aberrations. 
This requires the next approximation beyond the paraxial approx- 
imation. The following analysis closely follows that of Glaser [33], 
and in addition properly includes the effects of special relativity. 
This is also treated in detail by Rose [75]. 


We begin by defining the action integral (mechanical analog of 
the light-optical path length) W 2 between any two planes z, and 
zp in the paraxial approximation, 


m= E mo dz, (2.194) 


where mz is assumed to be known from the preceding analysis 
(2.159). The paraxial ray equation is 


Ome d Oma 
— = 0. 2.195 
ðr; dz Ox; ( ) 


The solution for the transverse Cartesian coordinates x;(z) and 
canonical momentum components P;(z) in the paraxial approxi- 
mation is (2.168) 


zoj 9(2) + £a; h(z) 
p(z) [zoz 9'(z) + va; k (2)]. (2.196) 


Ig 
—_~ AE 
Nx N 
| | 


where we have made use of (2.106, 2.159) and u(z) = x(z) +iy(z) 
to obtain 
P;(z) = p(z) 24 (2) (2.197) 


in the paraxial approximation. 
To obtain the aberration, we form the first order perturbation 
on the paraxial ray (2.108), 


2b 2 
ôW = i) (dmg) dz = > (Py Oty — Paj Obes), (2.198) 


j=l 
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where the start plane z, and the end plane z are assumed to be 
fixed. From this it follows that 


o 


= z5. (W2), Over = ~ OP, 


(5Ws), (2.199) 


where we designate 0%,; and 6x»; as the primary aberration at the 
start plane Za and end plane 2 respectively. We invert the paraxial 
solution (2.196) to solve for (%o;,24;) in terms of (xj, P;): 


Toj kl (ph'a; —hP;) 
i = ko} (gP;— pg zi), (2.200) 


where k is the conserved Wronskian (2.165). From the chain rule 
for partial derivatives, 


o o oroj o i Ox Aj o 

ðP; Wa) (52 Oto; OP; Axa; 

JA ð g ð 
k 0x0; k OT a; 


) owa 


) (6W2). (2.201) 


We now identify the start plane with the object plane, za = Zo, 
and we identify the end plane with the Gaussian image plane, 
zp = Zī. From the paraxial solution we have, by definition 


h(zr)=0, g(zr) =M, (2.202) 


where M is the magnification. The primary aberration at the Gaus- 
sian image plane is then (2.199, 2.201) 


bay; = y IA -f ma dz, (2.203) 


where we have identified the perturbation dm z = m4. From (2.159) 
we can express the perturbation m, in the compact series form as 


m, = Eig? +y?) + M (2? +y?) (x? +y’) + N (x°? +y’)? 
+P -2 (zy' —2'y) (2? +y7) + Q- 2(zy' —2'y) (2° +y’) 
E =e, (2.204) 
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where we have substituted 


bv = rP +y 
= 12 12 
vv = T +y 
iv — ov) = 2(xry +r y) (2.205) 


in (2.159). We have defined field coefficients (2.159, 2.204) in nat- 
ural units as 
Z -1V 1 -32 1-1 1 ,-3p4 
L = åp OO) =p Op BB =< sp 6 
—ip "(14 6) B* 


M = —ip'6"(1+0)—4p'B? 

N = -3p 

P = -yp 7B’-— 4p 76" (1+ 0)B+ 3B" 

Q = —;B 

K =se B (2.206) 


remembering the definition (2.124) for the scalar kinetic momen- 
tum on axis p. The effects of uniform space charge density p(z) 
can be included (2.193) by substituting 
g” zy "+p 
PY >PV +p" (2.207) 

in the field coefficients L, M, and P above (2.206). The solution 
(2.168) for the paraxial ray x;(z) is 

a(z) = xog(z)+xah(z) 

y(z) = yoglz)+yah(z) 


8. 
— 
XR 
wa” 
| 


zog(z)+ zah (z) 
y'(z) = zog'(z)+zah'(z). (2.208) 


Following Glaser, we define a new variable set 


R = 20° +yo° 

p = ta +ya? 

X = LOLATYOYA 

o = LOYA—YOLA. (2.209) 
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It follows (2.208, 2.209) that 


phy = Rg +ph? +2xgh 
r? +y’ = Rg? + ph” +2xQ'h! 
2(xy —x'y) = 2(gh'-gh)o=2p'tko. (2.210) 


From (2.204, 2.210) we express the perturbation m4 in terms of 
the new variable set as 


m= L(Rg’+ph?+2xgh) 
+ M(Rg+ ph? +2xgh) (Rg? + ph? +2x9'h’) 
+ N(Rg?+ph?+2xg hy 
P(2okp t) (R9 +ph?+2xgh) 
+ Q(2okp™)(Rg”+ph? +2xg' h’) 
K [40° k’ p? (p R—x’)], (2.211) 


where we have made use in the K-term of the readily verifiable 
fact that (2.209) 
P =pR-y’. (2.212) 


Expanding (2.211) and collecting terms, we now form 


f’ maz > AR IBP OL EDRoT ERY 
zo 
tFox+eRo+fpo+cxo, (2.213) 


where we have defined new coefficients A,... ,c. Collecting terms 
in (2.213) according to the definition (2.206), the new coefficients 
are 


J "(Lg +M g?g? +Ng")dz 


£ fa ht + M h?h? + Nh) dz 


[J ULPR+4Mghg K +4Ng’h? -4K kp) dz 
ZO 
ZI 


`o Q Ù B 
I 


BL KAMO h +h? g?)+2N gh? 


I 
S 


(0) 


+K k? p-*]dz 
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E = [ tg h+2Mgg (gk + gh) +4N gh! ]dz 


ZO 


p= J ULg +2Mhh (gh +g'h)+4N g'h”]dz 
= 2k f o ‘(Pg +Qg”) dz 

G-Z ak f o 1(Pgh+Qg' hdz 

pe 2k f o 1 (Ph? + Qh”) dz, (2.214) 


remembering that g(z) and h(z) are the two linearly independent 
solutions to the paraxial ray equation (2.162) satisfying boundary 
conditions (2.166). From (2.213) we have 


O ZI o o 
=A 24B 2).4 gic eee 
ce. made = Ap (R)+B 5 (P) +O 5 OÀ 


(2.215) 
The primary aberration 6x; in the Gaussian image plane is thus 
given in the rotated coordinate system by (2.203, 2.215) 


kM"? 6x; = Bl[4z,4 (22+ y%)] 
C [220 (tora + yoya) | 
+ D[27z4 (z6 + yo)] 
E 
F 


to (% + yd) |] 
to (a4 + y4) +224 (tor, + yoya) | 


e[—-yo (£o + yo) | 

c|ya (zo — Yo) — 20 Yo Ta] 

+ f[l-yo (4 + y4) +224 (toya — yora)], 
(2.216) 


remembering the definition (2.165) of the constant k, and where 
M is the magnification. Making use of the axial symmetry, the 
y-coordinate of the aberration is obtained by making the substi- 
tutions x > y, and y + —z for all occurrences in (2.216). 
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This gives 


kM™ ôy; = Aya (z3 + y%)| 
2 yo (toza + yoya) | 
(zb 


B 
6 

+ D[2y4 (26 +y0)] 
E 
F 


vo (eb +8) 

2 2 
Yo (24 + y4) +2 ya (Lots + yoya) | 
e[xo (zo + yo) | 
c[—wa (29 — yo) +220 yo ya] 
+ f [xo (a4 + y4) + 2y4 (Zoya — YoLa) ). 
(2.217) 


We notice that 
zoj = M! bay; (2.218) 


is the aberration, demagnified to the object plane zo. We call zo; 
the aberration referred to the object plane. This is of interest for 
a transmission electron microscope, for example, where the object 
coordinates form the natural reference for the expression of image 
quality. Similarly, one has the option of substituting zo = M7! z7 
and yo = M`! yz on the right sides of (2.216, 2.217), thus refer- 
ring the aberrations to the image plane zz. This is of interest for 
a probe forming system, like a scanning electron microscope or 
focused ion beam system, where one typically forms a demagni- 
fied image of a source. In this case, the image coordinates form 
the natural reference. Either object or image coordinates correctly 
express the aberrations. 


A significant simplification is possible by choosing the rotation 
of coordinate axes so that yo = 0; i.e., the off-axis object position 
is located along the x-axis. There is no loss of generality, owing to 
the axial symmetry, as the coordinates for any single object point 
can always be chosen in this way. 
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In this case the primary aberration (6x7, dy;) simplifies (2.216, 
2.217) to 


kM'6a, = 4Ba,4(2,+y2)+2(C4+D)23,04,+ E23 
AT YA O O 
+F zo (324+ y4) ter, yat2frotaya 


kM" dy; 4Bya (24 +44) +2Da,yat2Frortaya 


tex +cerh ra + fro (44 +3y4). 


(2.219) 


The various series representations of the aberration are known as a 
Seidel series. The individual terms in (B, C, D, E, F,e,c, f) repre- 
sent, respectively, spherical aberration, isotropic astigmatism, field 
curvature, isotropic distortion, isotropic coma, anisotropic distor- 
tion, anisotropic astigmatism, and anisotropic coma. They are re- 
ferred to as third order aberrations, because each term is third or- 
der in various products including (£o, yo, za, ya). This represents 
the solution for the primary aberrations. All quantities represent- 
ing length have the same values in natural units and SI units. This 
includes the aberrations 62; and dy;. However, the axial potential 
(z) and axial magnetic field B(z), which form the basis of the 
field coefficients (2.206), do depend on the choice of units. 


The preceding results give the aberration for a single ray. In prac- 
tice, a beam is comprised of a bundle of rays, each having a dif- 
ferent aberration dx,;; in the Gaussian image plane. Even in the 
limit of an ideal point object, these aberrations cause blurring of 
the image. The amount of blurring varies with defocus. It is there- 
fore of interest to study the aberration in a plane which is slightly 
displaced from the Gaussian image plane. Designating the axial 
displacement by dz, we seek an expression for the aberration 6x; 
in the plane zz + dz. This is given to first order in 6z by 


Xj + Ox; = Tij + 0x7; + Tij ôz 
6a; = Trj + Trj 62, (2.220) 


where x,; is the paraxial ray coordinate in the Gaussian image 
plane, ôxz; is the primary aberration in the Gaussian image plane, 
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and x}; is the paraxial ray slope in the Gaussian image plane. 
Evaluating (2.168) in the Gaussian image plane, this gives 


Tij = Toj Jr + Lay hy. (2.221) 
The aberration ôx; with defocus 6z is thus given by 
6x3 = 6x7; + (zoj gy + Ba; ht) 62. (2.222) 


For a fixed value of ôz, this is computed separately for every ray 
in the bundle, which in principle enables the blur to be found as 
a function of defocus 6z. 


We define a quantity 
W, = Í “ra (2.223) 
zo 


We recognize W4 = dW, as the first order perturbation on the 
action integral (2.194) which gives rise to the primary aberration. 
All rays emanating from any single object point have the same 
value of W2, corresponding to the paraxial approximation. Each 
of these rays has a unique value of W4 which in general differs 
from the other rays, corresponding to the aberration. The above 
analysis shows that all relevant information about the primary 
aberration is contained in W4. 


One can derive higher order aberrations by considering the terms 


Me, Msg,... in (2.159). These aberrations are referred to as fifth 
order, seventh order, ..., respectively. They have corresponding 
perturbations We, Wg,... in the action integral. This procedure is 


straightforward, but tedious, since each increasing order contains 
more terms than the preceding one. The number of terms needed 
to obtain an accurate representation depends on the size of the ray 
coordinates x;(z) and slopes 2/,(z). This in turn depends on the 
lateral extent of the beam, as determined by the physical aperture. 
A narrow beam requires fewer terms than a wide beam. The exact 
value of the action integral between the object and image planes 
is given by a 

Wor = | ‘‘m(z) dz, (2.224) 


O 
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where m(z) is the sum of all terms in the infinite series (2.159). 
In principle, all relevant information about the optical system is 
contained in W in the limit of geometrical optics. This fact will 
prove to be highly useful in the analysis to come. 


2.5.6 Spherical aberration 


In the case where the object is on the optic axis, we have zo; = 0. 
In this case, the primary aberration in the Gaussian image plane 
reduces to (2.216, 2.217) 


4 BM 
ÔTIj = T (x7, + y) Daj. (2.225) 


All of the aberrations vanish on axis except spherical aberration, 
represented by the B-term. Spherical aberration is the same ev- 
erywhere in the field, as it is independent of object coordinate 
Toj. It only depends on the coordinate z4; in the aperture plane. 
Spherical aberration is shown schematically in Figure 2.8. The axis 
of symmetry is the central ray in the bundle. Rays close to this axis 


Figure 2.8: Spherical aberration. 
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come to a focus at the Gaussian image plane, which is shown by 
the rightmost vertical line in the figure. Rays at the edge are more 
strongly focused, and intersect the axis in front of the Gaussian 
image plane. This plane is shown by the leftmost vertical line in 
the figure. A disk of least confusion is formed by the entire bundle 
at an intermediate plane. This represents the optimum focus. 


Spherical aberration is easily expressed as a function of the ray 
slope Tij in the Gaussian image plane. This slope is directly mea- 
surable by defocusing, whereas the aperture coordinate z4; is 
not easily measured. This necessitates transforming from the set 
(oj, Taj) to the set (xoj, x1;). Evaluating (2.168) in the Gaussian 
image plane, and setting 79; = 0, we find 


ws 
taj = a (2.226) 
I 
From (2.169) we have 
1 p;M 
ee 2.227 
in which case M 
p 
Taj = 7 Hig (2.228) 
We now define radial quantities r4, r4, Or; as 
ra = tatya 
r= ap typ 
ôr? = dat t+ dy?, (2.229) 


(2.230) 


Ôr = 


We define a; as the angle in radians which the ray makes with the 
optic axis at the Gaussian image plane. Assuming a < 1, 


r; = tan ar & ay. (2.231) 
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This enables us to write, in this small angle approximation, 
or, = Csr 03, (2.232) 


where we have defined a constant C's;, called the coefficient of 
spherical aberration, referred to the image, as 
4 BM*p? 
kA? 
where C's; depends on the axial electrostatic potential ®(z) and 
axial magnetic field B(z) through the coefficient B in (2.214). 


Csr = (2.233) 


Alternatively, spherical aberration can be referred to the object 
plane by making use of 


Ory = Moro, (2.234) 


where ôro is the aberration in the object plane zo. Applying the 
law of Helmholtz-Lagrange (2.69), we have 


Po ao (ôro) = pr ar (57), (2.235) 


relating the object and image planes. It follows that the angles in 
the object and image planes are related by 


Po 
Qr = ao, 2.236 
(227) (2.236) 


where the parenthesis on the right is the angular magnification. 
The spherical aberration in the object plane is then 


oro = Cso ad, (2.237) 


where we have defined the spherical aberration coefficient C'so, 
referred to the object, as 


3 
Ge (22) Csr (2.238) 


It is natural to refer the spherical aberration to the object plane 
in a transmission electron microscope, and to the image plane in 
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probe-forming systems such as a scanning electron microscope. 


It was first shown by Scherzer [77] that spherical aberration cannot 
be eliminated in static systems with axial symmetry in the absence 
of space charge, and excluding particle mirrors. This is equivalent 
to the coefficient B defined in (2.214) always being positive defi- 
nite. This fact can be proven by successive partial integrations of 
the expression for B, in which the integrand can be expressed by a 
sum of positive definite terms. The reader is referred to Rose [75] 
for details. 


Spherical aberration imposes a fundamental limit on the resolu- 
tion of electron microscopes. Substantial correction of spherical 
aberration has been successfully demonstrated using multipole el- 
ements [67, 75]. Corrected electron microscopes are now commer- 
cially available which achieve resolution better than 0.1 nm. This 
is sufficient to view single atoms. 


2.5.7 Field aberrations 


The aberrations represented by the terms proportional to C, D, E, 
F, e, c, f in (2.216, 2.217) represent isotropic astigmatism, field 
curvature, isotropic distortion, isotropic coma, anisotropic distor- 
tion, anisotropic astigmatism, and anisotropic coma, respectively. 
Unlike spherical aberration, all of these aberrations depend on 
field position, as represented by the object coordinates (xo, yo). 
We therefore call them field aberrations. Isotropic aberrations 
are characterized by having no dependence on azimuthal coordi- 
nate, while anisotropic aberrations have an azimuthal dependence. 
The isotropic aberrations arise in both electrostatic and magnetic 
lenses, while anisotropic aberrations only occur in magnetic lenses. 
The field aberrations can be best appreciated by simply plotting 
the geometric figure, which in general is a function of the object 
coordinates (zo, yo) and the aperture coordinates (x4, ya). In this 
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section we discuss the field aberrations individually. 


The aberrations represented by the terms proportional to C and 
c in (2.216, 2.217) represent isotropic astigmatism and anisotropic 
astigmatism, respectively. The aberration figure for astigmatism is 
plotted in Figure 2.9. The beam forms two line foci, separated by 


Figure 2.9: Astigmatism. 


an axial distance, and oriented at 90 degrees relative to each other. 
The axial separation of the line foci is proportional to the square 
of the radial distance off axis of the object point. The primary 
astigmatism aberration vanishes for an object point on axis. The 
line foci are oriented along the x- and y-axes for isotropic astigma- 
tism, and at 45 degrees to these axes for anisotropic astigmatism. 
The beam cross section is axially symmetric at an axial point mid- 
way between the two line foci. By adjusting the focus, one is able 
to locate this plane, which is characterized by an image which is 
axially symmetric, though blurred. Alternatively, astigmatism can 
arise from misalignment of the optical elements, or departure from 
axial symmetry. Although the manifestation appears similar, the 
mechanism by which the astigmatism arises is quite different from 
the primary aberration discussed here. 
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The aberration represented by the term proportional to D in 
(2.216, 2.217) represents curvature of field. This aberration results 
in an axial displacement of the plane of best focus which is propor- 
tional to the square of the radial distance off axis. The aberration 
figure for curvature of field is plotted in Figure 2.10. The figure 


Figure 2.10: Curvature of field. 


schematically depicts a longitudinal section through the optical 
system. The solid curved rays represent an object point on axis, 
for which the image plane z; coincides with the Gaussian image 
plane. The broken curved rays represent the aberrated rays for an 
object point off axis, for which the plane of best focus lies on the 
curve S. The focal surface for off-axis object points can be gen- 
erated in three dimensions by rotating the curve labeled S about 
the central optic axis. This surface therefore has axial symmetry. 


The aberrations represented by the terms proportional to £ and e 
in (2.216, 2.217) represent isotropic distortion and anisotropic dis- 
tortion, respectively. The aberration figures for positive and neg- 
ative isotropic distortion are plotted in Figure 2.11. Isotropic dis- 
tortion results in a radial displacement of the image point by an 
amount which is proportional to the cube of the radial distance off 
axis. The aberration figures for positive and negative anisotropic 
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Figure 2.12: Anisotropic distortion, (a) positive, (b) negative. 


distortion are plotted in Figure 2.12. Anisotropic distortion results 
in an azimuthal displacement of the image point by an amount 
which is proportional to the cube of the radial distance off axis. 


The aberrations represented by the terms proportional to F and f 
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in (2.216, 2.217) represent isotropic coma and anisotropic coma, re- 
spectively. The aberration figure for coma is plotted in Figure 2.13. 
Each circle represents the aberrated transverse position in the 


Figure 2.13: Coma. 


Gaussian image plane for a given radial coordinate in the aper- 
ture plane. The smallest circle corresponds to the zone nearest 
to the center of the aperture, while the largest circle corresponds 
to the zone nearest to the rim of the aperture. The figure with 
all of the circles is plotted for a single point in the object plane. 
Physically, coma arises because the transverse magnification varies 
with radial coordinate in the aperture plane. Isotropic coma has 
a figure which is aligned with the radial direction in the Gaussian 
image plane, while anisotropic coma has a figure which is oriented 
in the azimuthal direction. The intensity represnts a blur which 
resembles a comet, hence the name coma. One can show that the 
angle between the two lines which form the envelope for all of the 
circles is always 60 degrees. 
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2.5.8 Chromatic aberration 


We now turn our attention to the aberration which arises when an 
individual particle has a kinetic energy which differs infinitesimally 
by 6® from the nominal kinetic energy on axis ®(z). This arises 
quite commonly, as every practical particle source emits with a 
range or spread of kinetic energies at the emission surface. As the 
focussing action of electric and magnetic fields depends on the par- 
ticle energy, we expect an aberration, called the chromatic aber- 
ration to occur. This name originates from the analogy between 
particle optics and light optics, where the color or chromaticity of 
the light is directly related to the photon energy. The following 
analysis is based on that of Zworykin, et. al. [94]. 


The problem can be stated mathematically as follows: given the 
solution v(z) to the paraxial ray equation for energy ®(z) on axis, 
find the aberration dv; in the Gaussian image plane, arising from 
a constant perturbation d® in the energy. We begin by recalling 
that v(z) is the general solution (2.167) to the paraxial ray equa- 
tion (2.162) in the rotated system. Expanding the first derivative 
in (2.162), we obtain the equivalent paraxial ray equation 


v"(z) + [PPE (1+ 6)] v2) 
+| 4p?" (1+) + i p?B?] v(z) =0. (2.239) 


We define the perturbed ray and energy as 


u(z) + 6v1(z) 
(z) + 40, (2.240) 


v1 (z) 
®,(z) 


respectively. We assume 6® = const. 


We know a priori that v;(z) must be a solution to the paraxial 
ray equation (2.239) with energy (z). Substituting (2.240) into 
(2.239), and canceling the unperturbed terms, it is tedious, but 
straightforward to show that 6v;(z) satisfies 
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[PD (1+8) ] (dur)! 

RA [4p agp" (1+6)+1 pB] (v) 

(56) p ?{(2p 741) Pv 

+ §5[(2p?+1)6"+p7%(14+ 6) B7]v}, (2.241) 


retaining only terms to first order in dv, and ô®. This is an in- 
homogeneous second order differential equation in dv;, due to the 
nonzero right-hand side. 


The general solution to such an inhomogeneous equation can al- 
ways be expressed as the sum of the solution to the homogeneous 
equation, plus any particular solution to the inhomogeneous equa- 
tion. The left side is identical with the left side of the paraxial 
ray equation (2.239), with v(z) replaced by 6v;(z). The homoge- 
neous solution is therefore (2.167), with v(z) replaced by 6v(z). 
The independent solutions g(z) and h(z) are replaced by perturbed 
solutions, designated by g+ôg and h+6h, respectively. The pertur- 
bations dg and 6h do not appear in the first order approximation 
(2.241), however, and can be ignored. 


The general solution for 6v;(z;), evaluated in the Gaussian im- 
age plane of the unperturbed ray v(z) is 


6u(z1) = — p(z) S(z) h(z) dz, (2.242) 


M 
k Jzo 


where M is the magnification, k is the conserved Wronskian (2.165, 
2.169), and S(z) is the right-hand side of (2.241), namely, 


S(2) = (66) p2{ (2p? +1) Ov! 
+4[(2p?+1) "+p (14+) B’]v}. (2.243) 
The solution (2.242) is derived in Appendix B, along with the gen- 


eral method of solving an inhomogeneous second order differential 
equation. 


Having solved for the perturbation ôvı(zr), we must now express 
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this in the unperturbed coordinate system vu(z). Because the ray 
v1(z) has energy ®, which differs infinitesimally from ©, it follows 
that the rotation y(z) differs by an infinitesimal amount dy be- 
tween the v- and v,;-systems. Applying this rotation, we define the 
perturbation ĝv(zr) in the unperturbed coordinates u(z) according 
to 
v+du = ve? 
(v + ôv) (1 —ivdy). (2.244) 


Expanding this, the aberration expressed in the Gaussian image 
plane zz of the unperturbed system is 


du(zz) = bu, (zr) — i v(zr) Oxor, (2.245) 


retaining only terms to first order in small quantities. From 
(2.157), the perturbation 6x is found to be 


2 


dxor = Lf 5(p-) Bdz = —(56)-4 f" p3(14+) Bdz, (2.246) 
ZO z 


O 
where we have made use of (2.124), the definition of the on-axis 
kinetic momentum p. The minus sign expresses the fact that the 
rotation x is smaller for higher particle energy, 6® > 0. Substi- 
tuting (2.242, 2.243, 2.246) into (2.245), and making use of the 
solution (2.167), we find, after collecting terms, 


Ov; = (d®) [(C1 + iC) VO + C3 VA ly (2.247) 


where we have defined the chromatic aberration coefficients C1, Co, 
and Cs in natural units as 


M fa 
Ci = S p {(2p? +1) g'h 
zo 


+1[(2p +1)” +p°(1+8)B?]gh}dz 


M Z 
C = ~ | p3(146)Baz 
2 Jzo 
Cy = S {2p 241) ean 
Eze) 


+4[(2p? +1)” + p° (1+ 6) B*Jh? } dz. 
(2.248) 
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By inspection, the three terms in (2.247) represent, in order, field 
magnification, field rotation, and defocus. This last is independent 
of field position. The chromatic aberration (2.247) is referred to 
the object by substituting dvg = M7! dv. In the non-relativistic 
limit we substitute in (2.248) to obtain 


1+61, px V29, 2p? ba Ss (2.249) 


The chromatic aberration dvy can be expressed in Cartesian coor- 
dinates in the rotated system by substituting v = x + iy. Upon 
separating the real and imaginary terms, this gives (2.247) 


xr = (6®) (Ci zo — Coyo + C3 24) 
dyr = (6®) (Cryo + C2 zo + C3 ya). (2.250) 


This represents the solution for the chromatic aberration in the 
rotated coordinate system. The reader is reminded that xr and 
dy; are quantitatively identical in natural units and in SI units, 
since the dimension of length is the same in both sets of units. 


2.5.9 Intensity point spread function 


The net effect of geometrical aberrations and defocus is that, even 
in the limit of a hypothetical ideal point object, the image is not 
a point, but is blurred. The amount of blurring gives a direct esti- 
mate of the quality of the image. In classical geometrical optics, all 
relevant information about the image is contained in the intensity 
as a function of position in the transverse plane. The physical im- 
age intensity can be regarded as a two-dimensional convolution of 
the ideal image intensity with an intensity point spread function. 
The intensity point spread function is the image of a hypothetical 
ideal point object, in the presence of aberrations and defocus. A 
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method for calculating the intensity point spread function is de- 
rived by Gallatin [32]. In this section, we give an alternative deriva- 
tion which is consistent with the foregoing analysis, and leads to 
the same result obtained by Gallatin. 


In mathematical terms, we define the intensity point spread func- 
tion (xz), as the two-dimensional intensity distribution in the 
Gaussian image plane for an ideal point object, in the classical 
limit of geometrical optics. We wish to obtain an analytic expres- 
sion for [(x;), given the aberrations and defocus. We can write 


I(x;) = i ËP; pr(xı, Pr), (2.251) 


where x; = (27, yr) is the transverse position, in the rotated sys- 
tem, Pz = (Prs, Pry) is the transverse canonical momentum, and 
pr(xr, Pr) is the phase space density, all defined in the Gaussian 
image plane. We have integrated over all momentum components, 
to obtain the intensity as a function of transverse position only. 


Any optical system, however complicated, can be analyzed in 
terms of an equivalent system consisting of two lenses. This is 
shown schematically in Figure 2.14. In the equivalent system the 
object plane coincides with the front focal plane of the first lens. 


Lı A L, 


Figure 2.14: Equivalent confocal system. 
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A physical aperture is located at the back focal plane of the first 
lens, which coincides with the front focal plane of the second lens. 
It is easy to verify from the figure that the Gaussian image plane 
coincides with the back focal plane of the second lens. Both lenses 
are assumed to be ideal in the equivalent system. Each ray emanat- 
ing from a point object intersects the aperture plane at a unique 
transverse position x4 = (x4, ya). Because the lenses are assumed 
to be perfect, every ray has the same optical path length between 
object and image. In the real system, the various rays have differ- 
ing optical path length, owing to the aberrations. 


The optical path difference for the primary aberration is given 
by (2.223, 2.159) 


Wi= | mdz (2.252) 


ZO 
for the real system. We assume this to be known for each ray from 
the preceding analysis. We now assume that all aberration of the 
real system for a particular ray is concentrated in the aperture 
plane of the equivalent system, manifest as an optical path differ- 
ence W, from ideal. 


In mathematical terms, we wish to find the intensity point spread 
function (2.251) in the Gaussian image plane, given the optical 
path difference (2.252) for each ray in the real system. For the 
equivalent system we can express the momentum element in the 
Gaussian image plane in terms of a unique area element in the 
aperture plane as 


o Prs it. Prs Pr 
2p = Y yY 2 
4 f Ce OYA OYA | axa 


2 
2 (2) dxa, (2.253) 


where the large parenthesis is the Jacobian determinant, and where 
we have made use of (2.196) 
P(2) p(z) [xo g'(z) + xah'(z)| 
P; = p;(xog,;+xah') (2.254) 


in the paraxial approximation, where p(z) is the scalar kinetic mo- 
mentum on axis. Also, h} = 1/f where f is focal length of the final 
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lens in the ideal system. 


From Liouville’s theorem we can relate the phase space density 
in the aperture and image planes of the equivalent system as fol- 
lows: 

pi (X1, Pr) = pa (X4, P4). (2.255) 


It follows that the intensity in the image plane can be written as 
(2.251, 2.253, 2.255) 


I(x) = (2) / dxa pa(xa,Pa). (2.256) 


This expresses the intensity in the Gaussian image plane entirely in 
terms of quantities in the aperture plane of the equivalent system. 
The reason for choosing the equivalent system becomes clear from 
this. We can assume a simple form for p4 as follows: 


PA (Xa, Py) = T (xa) d(Pa = VWa), (2.257) 


where T(x 4) is assumed to be uniform over the aperture, corre- 
sponding to uniform illumination. Normalizing the area integral 
to unity, we set 

T(x4)=1/A (2.258) 


inside the aperture, where we define A as the area of the aperture 
in the equivalent system. 


The momentum distribution in (2.257) is a Dirac delta function, 
where VW, is the two-dimensional gradient in the aperture plane 
(2.60). From (2.109) this is the transverse canonical momentum 
in the aperture plane. In the limit of perfect imaging, the sur- 
faces of constant optical path are planar in the space between the 
two lenses of the equivalent system. This corresponds to a parallel 
beam of rays originating from a single object point. The intersec- 
tion of these surfaces with the aperture plane form straight lines 
(for a general off-axis object point), which represent the contours of 
Wa = const. In the general case with aberrations, these contours 
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are curved, owing to the gradient of the optical path difference 
across the aperture plane of the equivalent system. From (2.196) 


Pa = p4 (Xo ga + xa ha) = Pa Xr/ f, (2.259) 
where we have made use of 
XI M 
xo = r JA P 4 =0. (2.260) 


We also assume the equivalent system to be monoenergetic in the 
space between the aprture plane z4 and the image plane zz. It 
follows that p4 = pz = p. From (2.256, 2.257, 2.258, 2.259) 


1 OW. OW. 
x)= g | da f ays ò (2 ae -21) ò (2 ae un). 
(2.261) 


where we have made use of the properties of the delta function: 


1 
d(—x) = d(x), d(ax) = — d(x). (2.262) 

a 
Applying the property of delta function, we first define a function 
f(a.) = f dx f dy d[w(e,y) —a] d[v(e,y) =b]. (2-263) 


In order to evaluate this, we must transform from the set (x, y) to 
the set (u,v). This involves the Jacobian determinant, 


Ou Ov a Ou Ov 
Ox Oy OyOu 


With the coordinate transformation complete, the integral is eval- 
uated using the property of the delta function, 


Fla.) = fdu favo 5(u—a) 6(v—b) = a (2.265) 


Applying this mathematical formalism to the present problem, we 
define (2.261) 


du dv = ( dx dy = D(u, v) dz dy. (2.264) 


ð 
uļ(za, ya) = £w, (x4, Ya) 
ci = 1 wee (2.266) 
VTA, YA = p ðya TA, YA). . 
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We assume u(xa, ya) and v(z4, ya) to be known functions, as 
Wa(2a, ya) is known a priori. Substituting, we find 


D(u,v) = 


Ou Ov 7 Ou Ov 
Ox A OYA OYA Ox A 


_ (I\ (PWaPW, PW PWa 
7 Ox, Oy, Oya Ox 4 OTA OYA l 


P 
(2.267) 
Consistent with the delta function (2.261), we set 
u(@a, YA) = TI, v(@a, YA) = Y1, (2.268) 


and invert this pair to solve for (x4, ya) in terms of (x7, yr). Since 
Wa(xa, ya) is typically represented as a polynomial with terms 
x+y, this inversion amounts to finding the roots of a polynomial. 


Using this new solution for (%4, 9a), we form 
D[ulZa, 9a), U(£a, Ya) | = D(x1, yr). (2.269) 
The final result for intensity point spread function is then 


1 


a eee (2.270) 


I (ey, yr) 


Applying this procedure for any point in an extended object, one 
constructs the intensity point spread function about correspond- 
ing image point in the limit of geometrical optics. This is the main 
result of this section. 
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2.6 Stochastic Coulomb scattering 


In an earlier section, we discussed the effect of space charge on 
a hypothetical test particle in the beam. The space charge was 
regarded as a continuum. Approximating the current density as 
uniform within the beam volume, we showed that the space charge 
acts as a negative lens. In the paraxial approximation, the net re- 
sult of the space charge is defocusing. The space charge lens also 
has aberrations, which can be calculated in principle. 


This approximation does not precisely agree with experiment, 
however. The first indication of this was seen by Boersch [7], who 
measured significant energy broadening in an electron beam, which 
grew monotonically with current. This could not be explained 
by any continuum approximation. This took on practical signif- 
icance with the advent of electron beam lithography, for which 
the Coulomb interaction places a limit on the useful writing cur- 
rent for a given resolution. This in turn limits the throughput to 
values which are slow compared with optical lithography. 


A beam of particles can be regarded more realistically as a col- 
lection of discrete, moving point charges, distributed randomly in 
space within the beam volume. Every particle exerts a Lorentz 
force (2.15) on every other particle, resulting in a random dis- 
placement of each particle. The effect becomes more pronounced 
as the beam current is increased, owing to the closer proximity of 
beam particles. It also becomes stronger as the interaction time is 
increased, as the particles have more time to interact. The inter- 
action time increases as the length of the beam path increases, or 
the beam energy decreases. 


A rough estimate of the relative strength of the interaction is ob- 
tained from the average axial spacing between particles, given by 
the charge times the velocity divided by the current. In a typical 
electron microscope, no more than one electron is in the column at 
any given instant on average. Coulomb scattering is unimportant 
in this case. In a typical electron or ion beam lithography system, 
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one uses the highest possible current, without unacceptably de- 
grading the resolution. Consequently, the axial distance between 
particles can be on the order of micrometers. In this case the inter- 
action is important enough, that it imposes a limit on the useful 
writing current for a given resolution. 


2.6.1 Monte Carlo simulation 


Stochastic Coulomb scattering is a many-body interaction involv- 
ing a large number of particles. As such, the detailed motion of 
every particle cannot be solved in closed form, even if the initial 
position and velocity of every particle were known. Fortunately, a 
great deal of understanding can be gained by treating the inter- 
action statistically. Numerical Monte Carlo simulation [26, 39, 76] 
provides a tool for accurately predicting the relevant performance 
parameters for a given system configuration. A pseudo-random 
number generator is used to initialize the positions and veloci- 
ties of many particles in the vicinity of the source, with the beam 
voltage and source current taken into account. The motion of ev- 
ery particle is then traced numerically through the optical system, 
with the Lorentz force due to every other particle taken into ac- 
count at every step. 


The Lorentz force is the vector sum of an electrostatic force and a 
magnetic force (2.15). This is shown schematically in Figure 2.15. 
A particle labeled a with charge qa and mass mz, is at position 
ra with velocity va in the lab frame. This particle experiences a 
Lorentz force due to a second particle labeled b with charge q» 
and mass m, at position rp, with velocity v. The Lorentz force on 
particle a due to particle b is given in the lab frame by (2.15) 


d 
qi 00V a) = da(Ear + Va X Bat), (2.271) 
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Figure 2.15: Lorentz force on a particle due to a second particle. 


where Ea is the electric field at particle a due to particle b, and 
Ba is the magnetic field at particle a due to particle b. We can 
assume without loss of generality that the vectors v, and ra — rp 
determine a plane, and this plane coincides with the plane of the 
page in Figure 2.15. The electric field at the position of particle a 
due to particle b is given by 


db ra — rb 
Ea = — M. 2272 
b Arey |ra — rol’ ( ) 
The magnetic field is given by 


Am |rg — ral? 


This field points into the plane of the page for positive charge q». 
The Lorentz force on particle a is then 


dadb 
ATEo|ra — rl’ 
1 
fæ — r) + ava x [ve x (Ta — I} : 
(2.274) 


(YaMaVa) = 


dt 
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where we have made use of uoco = 1/c?. Expanding the double 
cross product, and summing over all particles b, the total vector 
Lorentz force on particle a due to all other particles is 


da db 
AT €p 2 [ra = rol? 


fæ B (1 z w5) Svilve ies nil. 
(2.275) 


RPA (YaMaVa) 


dt 


To this point we have made no approximations. 


We now approximate that dy,/dt ~ 0. The resultant acceleration 
a, of particle a is then given by 


dV da db 


dt = ATEO YaMa 3 [ra — ra|’ 
Va: V 1 
fæ — rọ) (1 =e :) + Velva - (ra — I} : 


(2.276) 


We are now in a position to numerically compute the trajectory of 
particle a. Dropping the subscript a, we can write a Taylor series 
for the trajectory point 7 + 1 in terms of the point 7 as 


You. = r;+ v;(At) T i a;( At)? + ł a;(At)? eee 
Viet = Ver a;(At) S 5 a;(At)? ae: 
as41i = ar a;(At) a great (2.277) 


where the time increment is At = t;,, — ti. The quantity a; is the 
time rate of change of the acceleration a;. This can be calculated 
analytically by time differentiation of the expression for the accel- 
eration. This is left as an exercise for the reader. This procedure 
is repeated for all of the particles in the sample. 


The physical significance of the interaction can be better appreci- 
ated by noticing that va ~ vy. Ignoring the last term on the right 
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of the expression (2.276) for the acceleration, we obtain 


dVa aq = Va Tb 
dt Aregmay® [ra — Yo|> 


(2.278) 


This closely resembles the acceleration due to a pure Coulomb 
force, but is reduced by a factor of 1/73. This reflects the rela- 
tivistic mass ym,, and a factor of 1/7? expressing the canceling 
nature of the electrostatic and magnetic forces at relativistic beam 
energies. 


As the time step At is decreased, the particle displacement ap- 
proaches a stable value. The number of computation steps is in- 
versely proportional to At, so one naturally chooses the largest 
At for which the displacement adequately approximates the stable 
end value. Including more terms in the Taylor expansion improves 
the convergence in a nonlinear way [88]. It can be shown that the 
truncation error for a given At is inversely proportional to n”, 
where n is the number of integration steps, and m is the order of 
the highest order term in the Taylor series. 


One increases the particle sample size N until a stable limiting 
value of the displacement is obtained. The number of computa- 
tions for each time step is N(N — 1)/2. Since this is quadratic in 
N, it has a large impact on the overall computation time for large 
N. The length of the simulated beam segment is proportional to 
N. The force on particles near the ends of the sample is improp- 
erly represented. To assess the importance of this, we imagine a 
half-space filled with charge of some uniform average density. A 
test particle on the boundary plane between the regions with and 
without charge experiences a repulsive force due to the charge. 
Considering the influence of a hemispherical shell of charge on the 
test particle, the strength of the force is inversely proportional to 
the square of the radius, and proportional to the total charge in 
the shell. This latter is proportional to the square of the radius. 
The net force is thus independent of the radius of the shell, and 
all shells have the same influence on the test particle, regardless of 
their radii. The range of the average Coulomb interaction is there- 
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fore effectively infinite. It follows from these considerations that 
the sample length must be much greater than the diameter of the 
beam for accurate simulation. This can lead to a very large num- 
ber of particles, and correspondingly long computation time. The 
stochastic contribution to the displacement is expected to have 
relatively short range, however, since the fluctuations average out 
over long distances. A technique exists to take advantage of this by 
separating the effects of the average and stochastic contributions 
[40]. The reader is referred to this reference for details. 


The above procedure applies to a drift length, with no external 
fields present. In principle, one can add these fields into the ex- 
pression for the Lorentz force. For many applications, a thin lens 
approximation suffices, in which the direction of the velocity is 
shifted toward the optic axis by an amount proportional to the ra- 
dial distance off axis. The magnitude of the velocity is unchanged. 
In this way, complex systems can be analyzed by Monte Carlo sim- 
ulation of a series of drift lengths separated by thin lenses. Such 
simulations have shown close agreement with experiment [51, 70). 
Monte Carlo simulation thus offers a powerful predictive tool in 
the design of practical systems. 


Problem 


Calculate an analytic expression for the time rate of change a; 
in terms of the position r;, velocity v;, and acceleration aj. 


2.6.2 Analytical approximation by Markov’s 
method of random flights 


Monte Carlo simulation is inherently accurate, due to the minimal 
assumptions needed to describe the physical process. However, it 
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suffers from two disadvantages. First, it is computationally inten- 
sive. For a given system configuration, a new simulation must be 
performed for every distinct operating point, and this can be very 
time-consuming. Second, it is difficult to achieve an intuitive un- 
derstanding of the physical process, such as the understanding one 
might obtain from an analytical theory. 


Given that the N-body problem cannot be solved analytically in 
closed form, it is of great interest to inquire whether a suitable ana- 
lytical approximation can be found. There is a history of attempts 
to express beam broadening due to stochastic Coulomb scattering 
by analytic approximations. Typically these approximations yield 
a simple algebraic dependence on experimental parameters such as 
beam energy, system length, beam current, and numerical aper- 
ture. The reader is referred to two excellent reviews by Kruit and 
Jansen [55] and by Jansen [49] for details. 


This approach has the advantage that the optical properties of 
a system can be estimated quickly and simply with some degree of 
accuracy. This facilitates an intuitive understanding of the depen- 
dency on experimental parameters. It has the disadvantage that 
the formulas depend on the specific system configuration and on 
the operating point. 


No simple, general formulation appears to exist. Also, it becomes 
necessary to independently evaluate the accuracy of the formula 
before relying on it for a detailed design of a system. Assuming the 
system has yet to be built, this evaluaton relies on Monte Carlo 
simulation. In practice, one uses a judicious combination of ana- 
lytic approximation and Monte Carlo simulation. 


In this study, we attempt an analytical analysis which does not 
result in such simple formulas, but strikes at the basic underly- 
ing statistical mechanics. The remainder of this section closely 
follows the earlier analysis by Groves [41]. We aim to derive a for- 
malism which lends itself well to numerical analysis, where this 
analysis is less computationally intensive than Monte Carlo sim- 
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ulation. To this end, we first discuss of the problem of random 
flights, originally formulated and solved by Markov, and reviewed 
by Chandrasekhar [17]. We imagine a general physical process con- 
sisting of a number of independent steps of varying size. For ex- 
ample, the process might be the motion of a gas molecule, where 
the molecule is multiply scattered. Between scattering events, the 
molecule travels a random distance, which represents the length 
of a step. We wish to determine the probability that the molecule 
travels a given net distance after N scattering events. There are 
many other examples of this general process. A key assumption is 
that the probability of a single event is independent of past his- 
tory. Such a succession of events is known as a Markov chain. 


In mathematical terms, the problem can be stated as follows. We 
assume the size of the jth step is governed by a probability den- 
sity 7;(x,) that the step length will be x;. Given this, we wish to 
find the probability density Wy(X) for net displacement X after 
N steps with displacements x;, where j = 1,..., N. This is com- 
pletely general, in that the vector quantities x and X can have 
any dimensionality. The probability Wy (X) is found by integrat- 
ing over all possible step lengths x,;, subject to the constraint that 
the individual steps must add up to give the desired displacement 
X. This is 


Wy(X) =f. fö (3: Xj -x| T1(X1) +++ TN (Xn): dx +--+ dxy, 


(2.279) 
where ô is the Dirac delta function, which ensures that only that 
space is included in the integration, for which the constraint is 
met. The delta function has an integral representation given by 


N 1 ” N 
ô (3: =x) = (On) fa k exp -x (3: -x)| : 


(2.280) 
where the integral is performed over the entire n-dimensional space 
of the n-vector k. 
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Substituting (2.280) into (2.279), and interchanging order of in- 
tegrations, we find 


Ww(X) = or I d”kexp(ik - X) 


I J dx; exp (—ik - x;) 7;(x,)|. (2.281) 


We identify the square bracket as the Fourier transform of 7;(x;) 
defined by 


7;(k) = J dx; exp (—ik - x;) 7;(x;). (2.282) 


At this point we assume that the same probability T(x) governs 
all individual steps x;. It is, therefore, permissible to drop the 
subscript 7, yielding 


= J d"k exp(ik - X) [7(k)]%. (2.283) 


Wy(X) = 
We identify this as an inverse Fourier transform. This can be ab- 
breviated using a shorthand expression 


Ww(k) = [7(k)]*. (2.284) 


This represents the general solution to the problem of random 
flights. It can be understood by applying the convolution theorem 
of Fourier transforms. This says that the transform of a convo- 
lution of two functions is equal to the product of the individual 
transforms of the functions. In this case, the overall probability is 
an N-fold convolution of the single-step distribution function with 
itself, as one would naturally expect. It is evident that the problem 
is quite naturally expressed in terms of Fourier transforms. 


We now seek to apply this general mathematical approach to the 
specific problem of stochastic Coulomb scattering. We imagine a 
single particle, chosen at random, intersecting the target plane at 
some transverse position. This position is determined by the action 
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of the optical system, together with the Coulomb scattering with 
every other particle. If one could remove the effect of the Coulomb 
scattering, the particle of interest would intersect the target plane 
at a different position. We call the vector difference between these 
two positions the trajectory displacement. It is governed by a prob- 
ability distribution. In mathematical terms, we wish to find this 
probability distribution function. 


Because the N-body problem cannot be solved in closed form, 
we are led to seek a suitable approximation. To this end, we imag- 
ine a second particle, also chosen at random. The second particle 
scatters with the first particle, producing a smaller random tra- 
jectory displacement. Now we imagine a third particle, chosen at 
random, producing a small random trajectory displacement of the 
first particle. Similarly, each of the N — 1 particles produces a 
random displacement of the first particle. Each of these scatter- 
ing events is a two-body interaction. As such, each event can be 
solved analytically in principle. We now form the vector sum of all 
of the N — 1 trajectory displacements of the first particle, making 
a resultant trajectory displacement. This is shown schematically 
in Figure 2.16. This is essentially the same approximation used as 
a starting point by Van Leeuwen and Jansen [89], although the 
details of their analysis are quite different from what is presented 
here. The vector sum of the two-body displacements is given by 
Xs, while the N-body displacement is denoted by Xy. In general, 
these two displacements differ, as they were arrived at by different 
means. 


At this point we form a key hypothesis, namely, the sum of the 
two-body displacements approximates the N-body displacement 
to within an error which is small, compared with the displacement. 
Mathematically, this is expressed as |Ky — Xg|/|Xn| < 1. This 
hypothesis can be tested using Monte Carlo simulation. Two sep- 
arate Monte Carlo simulations are required, one using the N-body 
algorithm described in the previous section, and another using the 
vector superposition of two-body interactions described here [4]. 
The two simulations are run with identical initial conditions for 
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x 


Figure 2.16: N-body interaction and superposition of two-body 
interactions. 


each particle in the sample. The resulting trajectory displacement 
is compared for each particle individually, with the difference be- 
tween the two simulations recorded. Statistics are then performed 
on the differences. It was shown in reference [41] that the two 
methods agree within about one percent for a particular severe 
case. The reader is referred to this reference for details. Our hy- 
pothesis can safely be considered to be vindicated for a wide range 
of operating conditions. 


The vector superposition of two-body interactions thus represents 
a useful basis to proceed. This superposition can be formally rep- 
resented as a Markov chain. This is intuitively evident from Fig- 
ure 2.16, in which the vector displacements visually appear like a 
succession of random flights. In fact, the mathematical assump- 
tions are the same, and the above analysis applies. We propose to 
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use the most general possible approach, namely, calculating the 
trajectory displacement in six-dimensional phase space. The fol- 
lowing analysis closely follows [41], with a few minor changes in 
notation. We define the following quantities: 


Xo = initial coordinate of an individual particle at time zero 
So = initial separation of two particles at time zero 

s = separation of two particles at time t, interaction present 
s’ = separation of two particles at time t, interaction absent, 


(2.285) 


where all quantities are six-vectors in phase space. The first three 
components are position, and last three components are momen- 
tum. All quantities are random variables, described by probability 
density functions. 


We define an individual particle trajectory displacement as the 
difference of two-particle separations s with and s’ without inter- 
action as follows: 

e=} (s-s), (2.286) 


where the factor of 5 appears, because the individual particle dis- 
placement is half the change in particle-particle separation for par- 
ticles of equal mass. Consistent with the above hypothesis, we ap- 
ply the Markov formalism, identifying € with a single Markovian 
step (2.282). The required Fourier transform for the distribution 
of trajectory displacements € is 


#(k) = j @eexp(—ik -€) r(€). (2.287) 


We define a probability density co(Xo) of an initial single-particle 
six-coordinate yo. For example, the beam might be of uniform 
spatial density, and monoenergetic. In this case the initial distri- 
bution o0(xo) is a constant spatially, multiplied by a delta function 
in momentum, within the beam volume, and zero outside the beam 
volume. 
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Given this, the probability density P of an initial two-particle 
separation So is 


Po(s0) = J a®x0o0(Xo) oo(80— xo), (2-288) 


where the integral is performed for yo over the initial phase space 
volume occupied by the beam. The integrand represents the joint 
probability of finding one particle initially at yg, and the second 
particle displaced by sp relative to the first particle. Integrating 
over all x9 ensures that P) represents all possible pairs with initial 
separation So. 


Alternatively, one could define 


Po(x0,80) = F0(X0) Fo(So — Xo): (2.289) 


This would retain the correlation with absolute single-particle six- 
coordinate xo. The first case, in which we integrate over yg leads 
by definiton to just the stochastic interaction arising from local 
fluctuations in the charge density. The second case without inte- 
gration leads to the full result. This includes both the stochas- 
tic interaction and the systematic effects arising from the global 
charge distribution within the beam. For brevity in the following, 
we consider the first case only. 


We now make use of the fact that trajectories are conserved in 
phase space. In any small volume of phase space, this is expressed 
as 


d°N = Po(so) d°so = P(s) dês = P’(s') dês’ = r(e) de. (2.290) 
It follows that (2.287, 2.290): 
7(k) = / dso Py(so) exp(—tk - €), (2.291) 


where the integral is performed over the space of initial two- 
particle separations, determined by (2.288). The two-body scat- 
tering has an analytic solution for the separation s in terms of the 
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initial condition sg. Separately, the separation s’ in the absence of 
interaction is also determined analytically from sọ. From (2.286) 
the trajectory displacement € is thus found in terms of sọ. This 
permits us to perform the integral (2.291). Substituting this into 
(2.283), we obtain the general solution for the stochastic Coulomb 
interaction, in the vector superposition of two-body interactions. 


We are often interested in only the transverse coordinates in some 
target plane, or alternatively, the broadening of kinetic energy, for 
example. In these cases, the other degrees of freedom, such as the 
displacement in the axial coordinate, are superfluous. We need 
to integrate over all of the superfluous degrees of freedom. For- 
tunately, the form of (2.287) makes this particularly simple. Due 
to a theorem of Fourier transforms, setting the frequency k equal 
to zero is equivalent to integrating over the entire range of the 
variable in direct space. By setting the superfluous components of 
k equal to zero, we automatically integrate over these degrees of 
freedom in the direct space of e. This leaves only those degrees of 
freedom we are interested in. In particular, this applies to (2.291), 
where the superfluous degrees of freedom integrate to unity. 


The general solution for the trajectory displacement is given by 
(2.283, 2.291). In general, the integral in (2.291) must be per- 
formed numerically. A special case exists for which a closed-form 
analytic solution exists, however. This is the case in which all par- 
ticles are initially at rest in the rest frame of the beam particles. 
This is equivalent to an initially monoenergetic beam with zero en- 
ergy in the rest frame. It follows that the beam is monoenergetic 
in the lab frame as well. It is simpler to perform the calculation 
in the rest frame, as the magnetic Lorentz force is zero, and the 
choice of reference frame does not affect the transverse position. 
The initial condition for the particle separation in six dimensions 
is 


Po(So) = Y% (ro) - 6(Po), (2.292) 


where rọ is the initial particle spatial separation, and po is initial 
momentum difference. The spatial separation distribution %(ro) 
will be determined later. Integrating over all momenta by setting 
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the momentum components of k to zero, we find (2.291) 


oo oo 2T 
7(k,;0) = f dpo po f dzo bloo o) f doo exp (—ik, - €r), 
(2.293) 
where k, is the three-vector spatial part of the six-vector k. We 
have set the three-vector momentum part k, to zero. This is equiv- 
alent to integrating over all momentum values in direct space. 


Next we must find the spatial trajectory displacement e€, at time 
t. For the case in which the two particles are initially at rest, the 
scattering reduces to the Kepler problem for zero angular momen- 
tum, where the particles fly apart along a line joining them. In 
this case the solution to the Kepler problem reduces to 


2 
i ee esi an, (2.294) 
TEegmrg ro r r 


where t is transit time, rọ is initial separation, and r is separation 
at time t. Substituting this into (2.293) we obtain 


z OO OO r 
7(k,,0;0) = 27 i. dpo po f dzo W(po, 20) Jo È Kip Po (= = 1)| i 

-o0 0 

(2.295) 

where k, is the two-vector spatial part in the transverse (p, ¢) 
plane. We have made use of assumed axial symmetry and the 
integral representation of the Bessel function Jọ(x) as 

1 


2T 7 
Jo(x) = aa doe ee. (2.296) 


We assume the particles are initially uniformly distributed over 
a cylindrical volume with radius a and length L. It follows that 
Wo(Lo) is convolution of a cylindrical volume with itself, as follows: 


2a 2a 2a E 


(2.297) 


1 2 
maL r 


Wolo) = 


where Yo is nonzero for 


0 < po < 2a, -L < zo < L, (2.298) 
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and zero outside this region. This yields the probability density 
for N — 1 we particles: 


meu s dk, ky [7 (kp, 0;0)}N- Jo (kpenp), (2-299) 


where enp is a of transverse component of resultant (net) 
trajectory displacement, and we have made use of axial symmetry. 
Equations (2.295, 2.297, 2.299) represent the solution. It is shown 
in [41] that this solution agrees quantitatively with Monte Carlo 
simulation for a particular severe case. 


The main result of this section is contained in equations (2.283, 
2.291) for the six-vector trajectory displacement in phase space. 
In general, the integral in (2.291) must be performed numeri- 
cally, given an assumed form for the initial distribution coordi- 
nates co(Xo). We have further shown that the dimensionality can 
be reduced in a straightforward manner by framing the problem 
in terms of Fourier transforms. This has enormous practical sig- 
nificance for extracting quantitative values for the components of 
trajectory displacement. 


2.7 Hamilton-Jacobi theory 


The solution to the general dynamical problem in classical me- 
chanics follows directly from Hamilton’s principle of least action 
(2.8). An alternative, but completely equivalent formulation ex- 
ists, in the theory due to Hamilton and Jacobi. This formulation 
will prove useful in the following chapter, where we will explore 
the correspondence between the classical and quantum mechanical 
descriptions of single-particle motion in the presence of a general 
electromagnetic potential. This section closely follows the analysis 
of Goldstein, et. al. [35]. 
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2.7.1 Canonical transformations 


We begin with Hamilton’s equations of motion (2.85), which are 
given in their most general form as 

OH : OH f 

= Qi, — =-P, eT 2.300 

JP Q 30; i n ( ) 
where the Q; are the n generalized coordinates, and P; are the n 
conjugate canonical momenta. The Hamiltonian is a function of 
all coordinates and momenta, where 


In general, H can have explicit dependence on the time t, which 
we regard as a parameter which uniquely specifies a given point 
along the trajectory. For brevity, we adopt a vector notation where 
H = H(Q, P; t), and 


Q = (Qi, ---; Qn) 
PEP Pe (2.302) 


In principle, Hamilton’s equations can be integrated to find the 
coordinates Q;(t) and canonical momenta P;(t) as functions of the 
time t. This would constitute a formal solution to the general dy- 
namical problem. 


It is always possible to transform to a new system of coordinates 
and momenta. As a simple example, one could transform from 
Cartesian to spherical coordinates. This would simplify a prob- 
lem with spherical symmetry, such as scattering by a spherically 
symmetric Coulomb potential. The components of canonical mo- 
mentum would also transform to a spherical system. 


In this context, we imagine a transformation to a system where all 
coordinates and momenta are constants of the motion. If such a 
transformation were possible, this would represent an immediate 
formal solution to the general dynamical problem, since the coor- 
dinates and momenta would simply be equal to their initial values 
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at time zero. Assuming for the moment that such a transformation 
can be found, we express the new set 


where the new coordinates q; and canonical momenta p; are as 
yet unspecified functions of the old Q; and P;. Here we adopt a 
different notation from the earlier sections, for reasons which will 
become clear in the following. The p; are not to be confused with 
the components of kinetic momentum described earlier. In order 
for the motion to be physically possible, we require that the new 
qi, pi also obey Hamilton’s equations 


OK. OK. 
Op; = qi, 04; ~~ Pir 


i=1,...,n, (2.304) 


where K (q, p; t) is the Hamiltonian in the new system. Any trans- 
formation for which Hamilton’s equations of motion (2.300, 2.304) 
are satisfied in both the old and new systems is called a canonical 
transformation. 


Hamilton’s principle (2.8) can be written separately in the two 
systems (2.19) as 


eq Ae Har, J dt=0 


t2 
ô £ pidi — K (q, Pp; J dt = 0. (2.305) 
ti P 


In order for both equations to hold, the integrands can differ at 
most by the total time derivative of an arbitrary function F as 
follows: 


DP Q-H=} pih- K+ TEQ a t) (2.306) 


where this represents a necessary condition relating the Hamil- 
tonians H and K in the two systems. The function F is called a 
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generating function. Expanding dF'/dt by the chain rule for partial 
derivatives, we find 


dF &| oF A OF 
— l nul 4 : 2.307 
poe | tagt | ðt al 
Equating coefficients of Q and q respectively, we obtain 
OF OF OF 
P, = , i = -—, K = H + — 2.308 
Q 8G, Fa e 


Other generating functions can be constructed. For example, we 
can define a new function S by 


S(Q, p;t) = F(Q, q; t) + 2 pi qi. (2.309) 


This is an example of a Legendre transformation. Substituting, it 
follows that 
OS os Os 


i = j K= H +- 2.310 
a0, = dp, Ta (2 


P, = 


At this point we make a key assumption: we imagine a transforma- 
tion for which K = 0, i.e., the Hamiltonian K in the new system 
is identically zero. Assuming such a transformation can be found, 
it would follow from Hamilton’s equations of motion in the new 
system that 


OK 0 OK 
ðp #7 ; OG: 


= —Ď; = 0, Ca Aewages (2311) 


From this we would immediately find that 
Pi = a; = const, qi = Bi = const, t= Dwg tp 12312) 


consistent with our original intent. The a; and 6; constitute 2n 
integration constants. Substituting above, we find 


as s 


50,’ BOs t) + oe = 2 (2.313) 


H (a. a , Qn, 
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This equation is called the Hamilton-Jacobi equation, and S' is 
called Hamilton’s principal function. We notice the significant fact 
that this equation only contains the variables Q; and t. 


This immediately leads to a formal procedure to solve the dy- 
namical problem in principle: we substitute 0S/0Q; for P; in H, 
then integrate to solve for S(Q, a; t). With S known, we then 
invert the equation 


oo 
~~ 0a; 


to solve for the Q;(t); i.e., find Q; = Qila, 8; t). The a; and 6; 
are determined by the initial conditions. This represents a formal 
solution to the general dynamical problem. 


Bi 5(Q, a; t) (2.314) 


It is of particular interest to consider the important special case 
where the original Hamiltonian H has no explicit time dependence. 
The above procedure in (2.305) to (2.308) applies, with the differ- 
ence that the time t does not appear explicitly in H(Q, P) and 
K(q, p). It follows from this that the generating function F(Q, q) 
has no explicit time dependence, and 


-E (2.315) 


We now define a new generating function W (Q, p) as 


W(Q,p) = F(Q,4) + og (2.316) 


i=1 


From (2.307, 2.315, 2.316) it follows that 


K=H. (2.317) 


Hamilton’s equations in the transformed system are 


0K. OK 


= ĝi, = —Ď;, es E 2.318 
Op; a OG: . ý l ) 
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We now make the key assumption that 


H = K = œ = pı = const, (2.319) 


where H is the conserved total energy, which we thus identify with 
the first conserved component of the new canonical momentum py. 
It follows immediately that 


OK 
Og Pi 5 U, 


Pi = Qi = const, (2.320) 


where the a; form n integration constants. Also, from (2.318, 
2.319), 


OK 1 when: = 1, 
Opi aa 0 when i= 2,..., n. (2021) 
This leads to 
_fJt+6 wheni=1, 
qi = l Bi when i = 2, ..., n, (2.322) 


where the (3; form n integration constants. From (2.319) it follows 
that 


H siaa Qny a a aS 
(2. raor 00), 
This is the Hamilton-Jacobi equation for the special case where 
the Hamiltonian H has no explicit time dependence. The function 


W is called Hamilton’s characteristic function. Also, it follows that 


l= a W(Q, a) -t, tal 
0a; 
o 
SD, ai 2.324 
Ja; W(Q, a), l 2, ‘sn ( 3 ) 


As before, this immediately leads to a formal procedure to solve 
the dynamical problem in principle: we substitute OW/0Q; for P, 
in H, then integrate to solve for W (Q, a). With W known, we 
then invert the equation 


o0 
— ĝa; 
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to solve for the Q;(t); i.e., find Q; = Qila, B). The a; and 8; are 
determined by the initial conditions. This represents the solution. 


The transformations generated by W and S' have quite different 
properties. From the third of equations (2.310) and from equation 
(2.319), the two generating functions are related by 


for the case where H has no explicit time dependence. This com- 
pletes the formal solution to the dynamical problem by Hamilton- 
Jacobi theory. 


It is interesting to explore the relationship between the generating 
functions S and W, and Hamilton’s principle of least action (2.8). 


We begin by making a working hypothesis, namely, Hamilton’s 
principle function S can be written as an indefinite integral 


t 3 
5(Q, œ; t) = f LQ, Q, ta, (2.327) 
where L is the Lagrangian. We now proceed to test the validity of 


this hypothesis. From the definition of the Hamiltonian (2.19) we 
rewrite this as 


S(Q, a; t) -[ ba H(Q, P: n| ae. (2.328) 


Taking the partial derivative with respect to Q;, we find 


= = far 30; (3:R.@) - pas 30, (2.329) 


i=l 


The first term on the right is identically zero, since the quantity 
in large parentheses has no dependence on Q;. From the second 
of Hamilton’s equations (2.300), the second term on the right is 
equal to P;, giving 

Os 


OQ; 


= P,, (2.330) 
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identical with the first equation (2.310). Taking the partial deriva- 
tive with respect to time, we obtain 


ðS o a 
=== / (Era) - 5 Si f HQP:t) (2.331) 


The first term on the right is identically zero, since the integral 
has no explicit time dependence. The second term on the right is 
H(Q, P; t). This leads to 


OS 

H+ =0, (2.332) 
identical with (2.313). We assumed in (2.327) that S is an indef- 
inite integral, and is therefore defined only to within an additive 
constant. This integration constant can always be chosen in prin- 
ciple so that (2.314) is satisfied, remembering that the a; and 6; 
are constants of the motion. This completes the justification of 
our postulate (2.327) for the form of S. We have thus identified 
Hamilton’s principal function S with the indefinite integral cor- 
responding to the action integral in Hamilton’s principle of least 
action (2.8). 


Next we consider the case where the Hamiltonian H(Q, P) has 
no explicit time dependence. We rewrite (2.328) as 


S(Q, 0; t) = Je -dQ— Ht, (2.333) 


remembering that H = a, is the constant total energy. From 
(2.326), it follows that 


= JP . ds, (2.334) 


where the right side is the indefinite path integral along the phys- 
ical trajectory. We have thus identified Hamilton’s characteristic 
function W with the indefinite integral corresponding to the action 
integral in the mechanical equivalent of Fermat’s principle (2.41). 
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2.7.2 Applications of Hamilton-Jacobi theory 


To convey a feeling for how to use of the theory, and to show that 
it actually works, we now apply it to two well-known examples [35]. 


The first example is the one-dimensional harmonic oscillator. The 
Hamiltonian is 


H=—+"=, (2.335) 


where k is the spring constant, and Q = is the coordinate for 
the displacement. Obviously, the Hamiltonian H has no explicit 
time dependence. According to our prescription, we now substitute 
for the momentum P = OW/0Q. The resulting Hamilton-Jacobi 


equation is 
1 (OW? kQ 
= Q}, 2: 

= (35) po 20) 


where a is the conserved total energy. Hamilton’s characteristic 
function W is expressed as the integral 


W(Q,a1) = fa /2may — mkQ?. (2.337) 
Also, 3y 
B= at (2.338) 


Substituting, we obtain 


_ [m dQ E is 
t+ 6 = e cea TE Ea): 


(2.339) 
We invert this to solve for Q as follows: 
2a k 
Q(a1, b1) = 4/ T cos f 7 (t + 61) (2.340) 


We now assume an initial condition that the displacement Q is at 
its maximum Qo at t = 0. From this it fillows 


ß1=0, a =5kQ5, (2.341) 
0 


ae?) 
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where qa, is the total conserved energy. This is equal to the poten- 
tial energy at maximum displacement Qo, where the kinetic energy 


is zero. Also, 
Q = Qo cos \ a r) ; (2.342) 
m 


This represents the solution, expressing the familiar cosinusoidal 
motion, where ,/k/m is the angular frequency. 


As a second example, we study the Kepler problem. This will have 
additional significance in classical Rutherford scattering, which 
will be explored in Chapter 4. The Hamiltonian is 


r2 


H = = a | z) U(r), (2.343) 


where U is the potential energy. Hamilton’s equation for the an- 
gular momentum is 


oH 


aF —P, = 0, Po = œ = const. (2.344) 
We write the radial and angular momenta as 
OW OW 
Bae ao ee oy, 2.345 
Or ° 99 ee) 


This leads to a separable form for W as follows: 
W (r, 6, a1, a2) = W,(r, a1, a2) + a26. (2.346) 


The Hamilton-Jacobi equation is 


1 aw\? a% 

2m Or r? 
Rearranging terms and taking the square root of both sides, we 
find 


EEA (2.347) 


ƏW, 
Or 


= i amcor — U) — —. (2.348) 


2.7. Hamilton-Jacobi theory 119 


Hamilton’s characteristic function W is thus expressed as the in- 
tegral 


az 
W = far Im(ay U= “2 + and. (2.349) 
T 
Also, 
aw 
ee Ea 2, 
fie, (2.350) 


This is equivalent to 


dr 
t+f,=m (2.351) 
' ae 
Furthermore, 
OW 
= 2.352 
and 

0 — (2.353) 


dr 
ars -a f r2/2m(ar —U)- a/r? 


To this point we have not yet specified a precise form for the ra- 
dially symmetric potential U(r), and the analysis remains general 
in this regard. 


At this point we assume an inverse law for U, namely 
Una, (2.354) 


where « is areal constant. The Coulomb force between two charges 


qı and qə has 
a 71492 


Atteg’ 


(2.355) 


for example. For charges of like sign, x > 0, and the force is repul- 
sive. For charges of opposite sign, « < 0, and the force is attractive. 
Making a change of variables u = 1/r, the equation for 0 is imme- 
diately integrated to give 


(2.356) 


0 =a | au + mk | 


m2r? + 2ma,az 
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We invert this to solve for u = 1/r as a function of 0 as follows: 


MK 2H P? 
RE P 1+4/14 mR COS (0 — 3) (2.357) 
We have identified 
ay = H, Q2 = Po, Bo = bo, (2.358) 


where H is the conserved total energy, and Py is the conserved 
angular momentum. We define a quantity called the eccentricity 
as 


(2.359) 


For 0 < e < 1 the orbit is an ellipse, for €e = 1 it is a parabola, 
and for € > 1 it is a hyperbola. The integral (2.351) for t cannot 
be expressed in closed form, but we assume an initial condition 
bı = —to. 


The main result is the orbit equation (2.357). This will apply di- 
rectly to classical Rutherford scattering in Chapter 4. 


2.7.3 Hamilton-Jacobi theory and geometrical 
optics 


Adopting the notation of earlier sections, the non-relativistic form 
of the conserved Hamiltonian H is 


2 
Hai U(x) = const, (2.360) 


2m 
where U(x) = qọ(x) is the time independent potential energy, and 
q is the charge of the particle. The canonical momentum P is given 
by 
P=p+qA, (2.361) 
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where p is the kinetic momentum and A is the magnetic vector 
potential. According to the preceding analysis, Hamiton’s charac- 
teristic function W is related to the canonical momentum P by 


pai (2.362) 
Ox; 
The Hamilton-Jacobi equation is 
2 
(Fe = aA) =2m zad) (2363) 


where a, = H is the conserved total energy, and the right side is 
the square of the scalar kinetic momentum. In principle, this can 
be solved for the trajectory by the above procedure, but no simple, 
closed-form solution exists. 


This is related to the action integral W,, by 


‘p.ds= | VW-ds= ds = [we] = Wa, 
i Í (2.364) 
where the integration path corresponds to a physical ray if and 
only if W satisfies the Hamilton-Jacobi equation. The optical path 
length Wap is identical with Hamilton’s characteristic function 
evaluated between the two end points x, and x. We have made 
use of 
P = VW, (2.365) 
which means that the canonical momentum P is normal to the 
surfaces W = const along the ray path. In the case where A = 0 
(no magnetic field), the kinetic momentum p is normal to the sur- 
faces W = const. 


A relativistic generalization for a hypothetical spin-zero particle 
can be formed from 


H = JPE + mA + qo = const, (2.366) 


which leads to the Hamilton-Jacobi equation 


V(vw — qA) E + mMc + qo = on. (2.367) 
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The equations (2.363) and (2.367) represent the main results of 
this section. They are not amenable to closed-form analytical so- 
lution, but will have far-reaching consequences in the correspon- 
dence between the classical and quantum mechanical analyses to 
come in Chapter 3. 


Chapter 3 


Wave optics 


To this point we have discussed the geometrical optics of charged 
particles in electric and magnetic fields, based on relativistic clas- 
sical mechanics. This fails to explain the important class of phe- 
nomena arising from diffraction and interference of matter waves. 
A proper description begins with quantum mechanics. 


This is perhaps best appreciated by considering the analogy with 
light optics. Einstein’s original hypothesis in 1905 holds that the 
electromagnetic field is quantized. As such, light propagates in 
discrete energy packets called photons. Furthermore, acording to 
a later hypothesis by Einstein, a single photon is endowed with 
momentum p which satisfies 


where h is Planck’s constant, given by h = 6.6261 x 10~*4 Joule- 
sec, and A is the wavelength. 


A later hypothesis of de Broglie states that this same relation- 
ship between momentum and wavelength holds for a particle with 
mass and charge. This indicates a close analogy between the dy- 
namical motion of a charged particle and a photon. Both exhibit 
particle- and wave-like behavior. 


123 
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Interference involves a single particle or photon propagating over 
alternative paths, with an associated uncertainty in the path 
taken. Position and momentum are described by a complex am- 
plitude or wave function, with the amplitudes for alternative 
paths adding to form a resultant complex amplitude. The abso- 
lute square of this amplitude gives the probability or intensity. We 
will explore the wave function in detail in the following sections. 
The spatial part of the wave equation governing the propagation 
of this amplitude is the same for a free particle and a free photon. 
Consequently, the formalism of scalar diffraction, interference, and 
image formation for light can be directly applied to particles. 


The path taken by a ray of light can be found from Fermat’s prin- 
ciple, which states that the physical path represents the shortest 
possible transit time through a medium. The path taken by a par- 
ticle can be found from the principle of least action, which states 
that the physical path represents the minimum of the action inte- 
gral. These two principles are strikingly similar. They arose from 
a classical description, but as we shall see in the following, each 
has a wave-optical analog as well. No classical analog exists for 
the quantum mechanical description. However, quantum mechan- 
ical motion of particles and photons approaches classical behavior 
in the high-energy limit. The analogy between particle optics and 
light optics is deep and pervasive. 


The central problem in this chapter is to solve for the wave func- 
tion for a particle of charge q and rest mass m propagating in a 
general electromagnetic potential. With this foundation, we then 
explore a few of the important implications for wave optics. We 
begin with a review of basic quantum mechanics governing parti- 
cle motion. We confine the discussion to only those topics which 
are relevant to the motion of a fast (unbound) charged particle 
in a general electromagnetic potential. This is the subject of the 
following section. 


3.1. Quantum mechanical description of particle motion 125 


3.1 Quantum mechanical description 
of particle motion 


We seek a dynamical equation to describe the motion of a single 
particle of charge g and rest mass m in the presence of a general 
electromagnetic potential. To this end, we begin with a review of 
basic quantum mechanics. For clarity, we will do this deductively, 
beginning with the fundamental postulates of quantum mechanics, 
and proceeding to the motion of a single charged particle in a 
general electromagnetic potential. The reader can refer to any of a 
number of excellent textbooks on basic quantum mechanics. [59], 
[79]. 


3.1.1 The postulates of quantum mechanics 


We begin with a fundamental postulate as follows: 


Every measurable dynamical variable C has a corresponding op- 
erator C, which satisfies a linear operator equation 


Cy=cy. (3.2) 


The dynamical variable C can be any measurable physical quan- 
tity. Examples include position, momentum, and energy, to name 
a few. The operator C acts on the function y, which is called an 
eigenfunction. The multiplicative constant c is called an eigen- 
value. In the following we will always denote an operator by a 
hat over the letter, to distinguish it from an ordinary variable or 
function. The definiton of a linear operator is explored further in 
Problem 1. 


The eigenfunction and eigenvalue are not necessarily unique, but 
can take on various values. The number and character of possi- 
ble values depends on the physical situation being described. For 
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clarity, we therefore rewrite the operator equation (3.2) as 
C Pj = ĉj Pj, (3.3) 


where the subscript j labels the particular eigenfunction y; and 
its corresponding eigenvalue c;. 


At this point we state a second postulate as follows: 


A single precise measurement of the dynamical variable C yields 
one and only one of the eigenvalues cj. 


This postulate establishes the physical significance of the eigen- 
values cj, namely, each eigenvalue is a possible result of a mea- 
surement of the corresponding dynamical variable. The physical 
significance of the eigenfunctions y; will be made clear later. 


We now proceed to apply this formalism to the motion of a charged 
particle. We begin by defining the operators which correspond to 
the dynamical variables of interest. The operators corresponding 
to the three Cartesian coordinates of position x are defined as 


& 
| 


W> 
|l 


Z, (3.4) 


X> 
|l 


where the operation is multiplication. In words, the operators cor- 
responding to the Cartesian coordinates are the coordinates them- 
selves. 


The operators corresponding to the three Cartesian components 
of the magnetic vector potential A(x), and to the electrostatic 
potential ¢(x) are defined, respectively, as 


A, = A,(x) 
Ay = A,(x) 
i; = A,(x) 
ọ = (x), (3.5) 
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where the operation is again multiplication. We assume for now 
that the electromagnetic potentials have no explicit time depen- 
dence. We will generalize this to the time-dependent case in a later 
section. 


The operators corresponding to the three Cartesian components 
of the classical canonical momentum are defined as 


x 0 
£ -ihs 
2 0 
P; = las 
2 0 


where the operation is partial differentiation. The operators cor- 
responding to the three Cartesian components of the kinetic mo- 
mentum are defined as 


be = P, —q Az 
Ês = Ê, — q Ay 
Ps = Ê, — GA, 
(3.7) 


by analogy with the classical definition (2.25), where q is the charge 
of the particle. 


Finally, the operator corresponding to the classical Hamiltonian 
function H is defined as 


A = th— (3.8) 
where t is the time, and the operation is partial differentiation. 
Although we discuss Cartesian coordinates, this description can 
be made to apply to different types of coordinate systems. The 


discussion is quite general in this respect. We will continue to use 
Cartesian coordinates here, because an intuitive picture emerges 
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which does not depend strictly on the choice of coordinates. 


We now seek an operator equation which describes the evolution 
of the particle motion in quantum mechanical terms. The classical 
conserved Hamiltonian is given in the nonrelativistic limit by 


H = = (p2 +p} + p2) + a(x). (3.9) 


At this point we make a fundamental assumption, namely, 
A valid operator equation can be constructed by substituting the 
operator expressions for every classical quantity in the dynamical 


equation. 


Refering to (3.9), This gives 


Ayat) = [z (B+ +H) +0800] v4). 810) 


Adhering to our description, u(x,t) is an eigenfunction, whose 
physical meaning will become clear later. 


The operation p? is obtained by applying p; twice in succession: 
PŽ = pz Pr. We assume for now that the magnetic vector potential 
A is zero. The more general case with nonzero A will be consid- 
ered later. 


Equating the two expressions (3.8) and (3.10) for the Hamilto- 
nian operator, we can write down the resulting operator equation 
as 


ra) he f.0? o? o? 
th =, V(x, t) z -2 (sa Oy? l Z) | 0) ie 
(3.11) 


The eigenfunction y(x, t) depends on the three spatial coordinates 
x = (x, y, z) and the time t. In the following we will make use of 
the V notation, where, by definition 

o? o? o? 


V(x, t) = V : Vu(x,t) = (2 EA Z eD: (3.12) 


3.1. Quantum mechanical description of particle motion 129 
We therefore write (3.12) as 


2 
ih (x,t) = -E V(t) +49) YOE1). 813) 
This is known as the time-dependent Schrodinger equation. It is a 
linear partial differential equation of second order in x, and first 
order in t. It applies to general curvilinear coordinates as well as 
Cartesian coordinates, where one substitutes the appropriate form 
for the Laplacian operator V?. It can be solved in principle for the 
eigenfunction w(x, t), given the explicit form for ¢ and appropriate 
boundary conditions. 


We now investigate the physical meaning of the eigenfunction 
u(x,t). We assume that y can be written in the separable form 


p(x, t) = ux) r(t), (3.14) 


where u is a function only of x and 7 is a function only of t. 
The function u is not to be confused with the complex transverse 
particle position in the earlier description of classical geometrical 
optics. Substituting above, and dividing through by ut, we obtain 


pa : ny H. (3.15 
a ZIG. Ero u(x) + q@(x)| =H. (3.15) 
We notice that the left side depends only on t, while the middle 
depends only on x. This is only true in the case where ¢ is inde- 
pendent of time, which we assume for now to be the case. This 
separation of variables can only hold for all x and t if both sides 
are equal to an arbitrary constant, which we call H. The physi- 
cal meaning of H will become apparent in the following, but for 
now it is just an arbitrary constant. Substituting, we obtain two 
separate, decoupled equations given by 


d 1H 
Ert 1(t) =0 
Vue n-ae Exe) 
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The first equation is integrated immediately to give 
rb) = Oyen’ (3.17) 


which the reader can verify by direct substitution. Without loss of 
generality, we can assume an initial condition 7(0) = 1. The solu- 
tion for u(x) depends on the particular form for the electrostatic 
potential (x), which we leave unspecified and general for now. It 
follows from (3.14) and (3.17) that 


w(x, t) = u(x) eP, (3.18) 


The physical significance of the constant H becomes apparent. if 
we apply the Hamiltonian operator (3.46) to this form for the 
eigenfunction w(x, t). This is 


A 


y(x, t) = ih i(x1) = Hu(x,t). (3.19) 


It is immediately apparent from the first postulate above that the 
constant H is the eigenvalue corresponding to the Hamiltonian op- 
erator H. In the present case, where we assume the potential o(x) 
has no explicit time dependence, the constant H is the eigenvalue 
representing the conserved total energy. 


Depending on the specific boundary conditions, yet to be spec- 
ified for the particular problem at hand, the Schrodinger equation 
(3.13) is satisfied only for certain specific values of u(x) and H. We 
label these u;(x) and H}, respectively, where the subscript j is only 
a label, with integral values assigned for bookkeeping purposes. 
According to the second postulate above, a single measurement of 
the total energy H must yield one and only one of the possible 
values of H;. The presence of the subscript helps to remind one 
that the eigenvalues H; and the dynamical variable corresponding 
to the classical Hamiltonian function H are two distinct quantities 
which are related to one another by the formalism just described. 
Based on this, we can define a set of eigenfunctions which describe 
the complete behavior of the particle in space and time. This is 


WAX, t) = u(x) ely (3.20) 
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which satisfies the time-dependent Schrödinger equation (3.13). 
The eigenfunction y; oscillates with an angular frequency w; de- 
fined by 

H; = ħwj. (3.21) 


The angular frequency is constant for a given constant energy 
eigenvalue H;, regardless of the form of the electrostatic potential 
(x), assuming the potential has no explicit time dependence. 


We will now proceed to derive two general mathematical prop- 
erties of u; and H;, which will greatly simplify the discussion to 
follow. From (3.15) we can write 


= V7ui(x) + O(x) u(x) = Hy ui(x) 
T V7u;(x) + q(x) u(x) = Hax) (3.22) 


for two different values of the indices ¿i and j, where we have taken 
the complex conjugate of both sides in the second equation. Mul- 
tiplying the first equation by uj, multiplying the second equation 
by u;, and subtracting the second equation from the first, we find 


——— [uE Vua) — u (x) Vax) | = (Ai — Ay) a(x) wlx), 
(3.23) 
where the potential energy term vanishes, assuming @(x) is real. 


At this point we specify boundary conditions on u;(x). We assume 
that (3.15) is valid only within a cubic volume of side L, where L 
is arbitrary. Recalling that x = (x,y,z) in Cartesian coordinates, 
we further assume that u;(, y, z) satisfies the boundary condition 


uilz + L, y + L, z + L) = u(z,y, z2). (3.24) 


Mathematically, this represents the periodic extension of the wave 
function over all of space. This is therefore called a periodic bound- 
ary condition. There is no loss of generality in this assumption, 
because of the arbitrariness of L. Next, we integrate (3.23) over 
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the cubic volume. This leads to 


= (H; — H;) dx uj(x) u(x), (3.25) 


where V = L? is the cubic volume. Rearranging the left-hand side 
and applying the divergence theorem, we obtain 


a : dx V - [tij(x) V(x) — ui (x) Vax) ] 
E of pel 2 a0 7 
E of S 09 g ui(x) 7, wi) 
zi (3.26) 


where S' is the surface of the cube. The integral vanishes because 
of the periodic boundary condition, together with the fact that 
the normal derivative is equal and opposite on opposite sides of 
the cube. We therefore have 


(Get 2 dx ti;(x) u(x) = 0. (3.27) 


In the case i = j, we assume that the eigenfunction u;(x) is nor- 
malized so that 


J, dx G(x) u(x) = 1. (3.28) 


It follows that E 


Equivalently, all energy eigenvalues H; must be real-valued. In the 
case i # j, and assuming H; # H,, the integral in (3.27) must 
vanish. We therefore have the general property 


a dx; (x) u(x) = õi, (3.30) 


where 6;; = 0 for i # j, and 6;; = 1 for i = j. The integral 
is performed over the cubic volume. This property is known as 
orthonormality of the eigenfunctions u;(x). Equations (3.29) and 
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(3.30) represent two very useful mathematical properties in the 
discussion to follow. 


According to the second postulate above, a single precise mea- 
surement of the dynamical variable H must yield one and only 
one of the eigenvalues H;. It is logical to enquire what determines 
which of the eigenvalues it must be, or is likely to be, given a 
specified experimental condition. We now turn our attention to 
this question. We define a function 


(x,t) = >a p(x, t), (3.31) 


where all eigenfunctions w;(x,t) are assumed to satisfy the time- 
dependent Schrödinger equation (3.13), and where the {a;} rep- 
resent a set of complex constants, whose values have yet to be 
determined. It is straightforward to show by direct substitution 
that U(x, t) satisfies the Schrödinger equation as well, namely, 


R ð 
=a t) + ¢O(x) U(x, t) = th pads t). (3.32) 


The proof of this is left as a problem at the end of this section. 


We now enquire into the physical interpretation of the func- 
tion U(x, t). Writing out the explicit form of the time-dependent 
Schrodinger equation, together with its complex conjugate equa- 
tion, we find 


Piera vicon 
iñ gtt = -m X, q o(x) U(x, 
O-= h? = z 
ih t(x t) = —5- VW (x,t) + ¢¢(x) U(x, t). (3.33) 


Multiplying the first of these by Y, multiplying the second by Y, 
and subtracting the second equation from the first, we find 


; Que On ae = 
n(u soso) =-5 (uvw—vv8), (3.34) 
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We can rewrite this as 
ih a, = o = 
v- [2 (vvt - vvv) |+ 2 (wv) =0 (3.35) 
This has a clear physical interpretation. We define a three-vector 
quantity J (x,t) as 


J(x,t) = z (VVw- vv), (3.36) 
and a function P(x, t) as 
P(x, t) = U(x, t) g W(x, t) = T(x, t)|?, (3.37) 


which is positive-definite. The above equation can be rewritten as 


o 
V- J+ > P=0. 3.38 
Ji (3.38) 
We immediately recognize this as a conservation equation, in anal- 
ogy with fluid flow, where P(x, t) represents a density, and J(x, t) 
represents a flux. 


Based on these mathematical arguments, we identify the quantity 
P(x,t) d?x as the probability that a single precise measurement 
of the particle position will find the particle in a volume element 
d?x about the position x at time t. We therefore call the quantity 
P(x,t) a probability density. As a consistency check, we form the 
integral over the cubic volume V, 


ð 
3 . — 3 = 
Ja xV-J+ T fa xP(x,t)=0. (3.39) 


Using the divergence theorem, the leftmost term can be rewritten 
as 


[éxv-t= | 3-48, (3.40) 


where the surface S surrounds the volume V. The integral over the 
surface S vanishes, owing to the periodic boundary condition, and 
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the fact that the normal derivative is equal and opposite on oppo- 
site sides of the cubic volume. It follows that the time derivative 
in (3.39) vanishes. We can therefore write 


J, dx P(x, t) = a dx |U(x, t)? = 1, (3.41) 


where we have made use of the fact that (x,t) can be multi- 
plied by an arbitrary constant, and still satisfy the time-dependent 
Schrodinger equation. According to the probability hypothesis, 
this is physically equivalent to the fact that the particle is cer- 
tain to be found somewhere within the volume V. 


Substituting, 


[bx iuc ep 
V 


J 2x ja a; Wil, r) = aj wed) 


So a; aj e EE a dx ū;(x) u(x). 


i,j 


(3.42) 
Equivalently, 


f Ëx |U(x,t)? = ja =1. (3.43) 
V - 

j 
At this point, we invoke a third key postulate, due originally to 
Born [9], [10]: 


The quantity |a;|? represents the probability that any single precise 
measurement of the total energy will yield the energy eigenvalue 
H;. 


From (3.43) the individual probabilities sum to unity as required. 
The set of {a;} are referred to as the state vector, and the function 
W(x, t) is called the state function. 


Based on this probability interpretation, we now define the ez- 
pectation value (H) of the total energy at time t as 


(H) = ? dx U(x,t) | HU(x,t)]. (3.44) 
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Substituting the definition (3.31) for the state function V(x, t), it 
is straightforward to show that 


(H) =} |a; |? H; (3.45) 
j 
as expected. 


The state function U(x, t) can be written in terms of the set {a;} 


as 
W(x, t) =X aj w(x) ee (3.46) 
j 


Multiplying both sides from the left by u; and integrating over the 
volume V, we find 


| dx t(x) U(x, t) = Y aj ett | dx t;(x) u(x). (3.47) 
v F v 
Making use of the orthonormality of the uj, this is just 
a; = ett | dx G(x) U(x, t). (3.48) 
V 


Given the state function U(x, t), we have thus calculated the coef- 
ficients a; of the state vector. The equations (3.48) and (3.46) are 
therefore the inverse of one another. 


We now turn our attention to the relationship between theory 
and measurement. We will do this in the context of a beam of 
charged particles, although the thought process applies to other 
quantum mechanical systems as well. The foregoing analysis ap- 
plies to a single particle. All relevant information is contained in 
the state function U(x,t) and the state vector {a;}. The absolute 
square of the state function is the probability density that a single 
precise measurement will find the particle at position x at time t. 
The absolute square of any coefficient a; is the probability that a 
single precise measurement of the energy will yield the eigenvalue 
H,. We can consider the particle to exist in a particular state, as 
completely specified by these quantities. 
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One would naturally ask how the particle comes to exist in one 
particular state, out of a multiplicity of possible states. The an- 
swer is determined by the initial experimental condition by which 
the state of the particle is prepared. 


For example, we might have a case where the particle has passed 
through an energy filter, so that the magnitude of the momentum 
is selected to have one particular known value. Downstream from 
this, the particle might be normally incident on a pair of slits 
in an otherwise opaque screen, such that it must pass through 
one of the slits. However it is not known which slit the particle 
passes through. This initial condition defines the state function on 
the front side of the screen as a constant which is independent of 
transverse position. Furthermore, the state vector roughly has one 
particular coefficient a; equal to one, with all other coefficients 
equal to zero. We will see in the following sections that this is 
a monoenergetic plane wave, and the experiment is the familiar 
two-slit experiment. This represents a particular preparation of 
the state, with this preparation being achieved in a controllable 
manner experimentally. With this as an initial condition, the state 
function and state vector then propagate through space and time, 
consistent with the dynamical equation of motion (3.32). 


In this example, one might allow the particle to impinge on a 
phosphor screen placed further downstream. A flash of light is 
emitted at the landing position of the particle. The transverse po- 
sition of the particle is thus measured with high precision. The 
position of a single particle at a single time does not represent a 
great deal of useful information, however. It is much more use- 
ful to measure the macroscopic properties of a beam of particles. 
These properties include the current, the distribution of intensity 
as a function of transverse position, the distribution of intensity 
as a function of angle, and the distribution of kinetic energies. In 
the present example the first two properties are measured. One 
can envision an alternative measurement in which the phosphor 
screen is replaced by a movable detector or a spectrometer. These 
would allow measurement of the intensity as a function of angle 
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or the distribution of energies. Measurement of these macroscopic 
beam properties is conceptually equivalent to repeating the single- 
particle experiment many times, once for each particle in the beam. 
The preparation of the single-particle state is assumed to be the 
same for each particle. This is the conceptual connection between 
theory and measurement. 


In order to better appreciate the physical significance of the the- 
ory, we now explore an important and useful special case, namely, 
a particle moving in a field-free space. This is the subject of the 
next section. 


Problems 


1. Construct explicit expressions for the operators representing 
the three Cartesian components of angular momentum. 


2. By definition, a linear operator C satisfies 


Ĉĉ (c pı + C2 p2) = cÔ pı + eC Ya, (3.49) 


where cı and cz are any two complex constants. Examples of linear 
operations include multiplication by a constant and differentiation, 
to name just two. Prove that all of the operators discussed in this 
section are linear. 


3. Prove that the operator CC is linear if C is linear. 


4. The commutator of two operators Gi and Ce is defined as 
Cis Cy] = Ĉĉ Gs = Ô i: (3.50) 


(a) Write down an explicit expression for [ĉ, P,], where ĉ is the 
operator for the x-coordinate, and P, is the operator for the x- 
component of the canonical momentum. 


(b) Write down an explicit expression for [ĉ, PI, where ĉ is the 
operator for the z-coordinate, and P, is the operator for the y- 
component of the canonical momentum. 
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5. Prove that the state function U(x,t) defined by (3.31) satis- 
fies the time-dependent Schrédinger equation (3.32). 


3.1.2 Particle motion in a field-free space 


The special case of a particle in field-free space is represented by 
(x,t) = 0 and A(x,t) = 0, where ¢ and A represent the elec- 
trostatic scalar potential, and the magnetic vector potential, re- 
spectively. The spatial part of the Schrödinger equation in (3.16) 
reduces to 


2nH 
V2u(x) + a u(x) = 0, (3.51) 
where H is the conserved total energy. Equivalently, 
V’ u(x) + k? u(x) =0, (3.52) 


where we have defined a constant k by 
(3.53) 


We propose to integrate this using separation of variables, similar 
to the previous section. We assume that the eigenfunction u(x) 
can be expressed in Cartesian coordinates in separable form as 


u(x, y, z) = X(x) Y (y) Z(z). (3.54) 
Substituting, and dividing through by XY Z, this leads to 


X™(z)  ¥"(y) 
X(x)  Y(y) 


zZ" (z) 
Z(z) 


tk? = 0: (3.55) 
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This, in turn, leads to three independent equations 
X” (x) +k? X(x) = 0 
Y"(y) tk Yu) = 0 
Z"(z)+k2Z(z) = 0, (3.56) 


where we have defined three separate constants k,,k,, and k, 
which obey 
k? = kz + ky + kz. (3.57) 


Taking the first equation in (3.56), this has two independent solu- 
tions given by 


X(x) =e, (3.58) 


which the reader can verify by direct substitution. We immedi- 
ately recognize a problem, in that the integral of |X (x)|? over the 
range of x from —oo to œ is infinite. This is inconsistent with the 
probabilistic interpretation of eigenfunctions. 


Our analysis is incomplete to this point, however, because we have 
yet to specify the boundary conditions. To resolve this, we first 
impose the arbitrary condition that X(x) is defined only over the 
range —L/2 < x < L/2. We further impose the periodic boundary 
condition by assuming 


X(x+L) = X(z), (3.59) 


which we are completely at liberty to do, without loss of generality. 
Substituting, this yields 


area (3.60) 
This, in turn, requires that the constant k, take on discrete values 


27Nz 
k= a mew cn (3.61) 


In addition, the solution can be multiplied by an arbitrary con- 
stant, without affecting its validity. This gives 


X(x) = —— e”, (3.62) 
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It follows that 


L/2 
J dx |X(«)|? =1, (3.63) 
-L/2 


thus satisfying the normalization condition, required for probabil- 
ity. 


Repeating this for the y- and z-equations, we obtain 


U(x) = X(x) Y(y)Z(2) 


e (kx t+ky y+kz z) 
er (3.64) 


where V = L? is the volume of the cube. We have adopted the 
vector notation k = (kz, ky, kz). The components ky, ky, and k, 
take on the discrete values 


27 2TN 27 
his * k= d i= =, (3.65 
L ’ y L , L ( ) 
where n; = 0,+1,+2,.... The vector k is called the wave vector. 
It is straightforward to show that 
l a E Ou, (3.66) 
v 


where the integral is performed over the cubic volume V. The 
eigenfunctions u,(x) are plane waves, each with a unique wave 
vector k. 


Each set of nz, ny, and n, represents a distinct state with energy 
given by 


h? 
y | Kz) ~ IML? (r: 


ae 2 2 

Hy = 5 (k +k n2 +n). (3.67) 
The interval L can be chosen to be arbitrarily large. As L is in- 
creased, the energy values become more closely spaced. In the limit 
L — œ, the energy levels approach a continuum. It does not fol- 


low that the energy eigenvales become small, since the integers ng, 
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Ny, and n, can take on arbitrarily large values. 


The eigenvalues (ky, ky, kz) form an infinite cubic lattice of equally 
spaced points in k-space. Each lattice point can be regarded as oc- 
cupying a cubic volume element of (27/L)? around the lattice point 
in k-space. Based on this, the number of states per unit volume in 
k-space is given by 

dN V 

noe 3.68 

dV; (27)3 ( ) 
where, again, V = L. A unique wave vector k exists for each 
lattice point with components given by 


k = (ke, ky, kz). (3.69) 


From (3.67), the discrete energy eigenvalue associated with each 
lattice point is 
hk? 


H 
K 2m 


, (3.70) 


where k is the magnitude of the wave vector k. It follows that 
the surfaces of constant energy in k-space are spheres of radius k 
about the origin k = (0,0,0). A small energy interval is therefore 
represented by a spherical shell of volume dV, and thickness dk 
where 


dV, = 4r k?dk. (3.71) 


We can calculate the number of states per unit energy interval. 
This is given using the chain rule for derivatives as 


dN _ dN dV, dk 
dH, dV, dk dH, 


(3.72) 


It is straightforward to show from (3.68, 3.70, 3.71, 3.72) that 


dN 4aV y~ 


This quantity will turn out to be very useful later on. In words, 
the density of energy states is proportional to the square root of 
the energy. This calculation shows the simplification which results 
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from regarding the states in k—space. 


Separately, it is interesting to see what happens when we apply 
the operator for the canonical momentum to the energy eigenfunc- 
tions. This gives 


—th Vux(x) = hk ux(x). (3.74) 


Evidently, the energy eigenfunctions u(x) are also the eigenfunc- 
tions of the canonical mometum operator, with eigenvalues hk. 
Having assumed that the magnetic vector potential A is zero, we 
therefore identify hk with the kinetic momentum. This momentum 
is proportional to the gradient of u(x). It follows that the vector 
k is perpendicular to the surfaces u = const, and therefore points 
in the direction of wave propagation, as expected. The wavelength 
is given by 

_ 20 
ae 
This is the de Broglie wavelength, given by A = h/p, where p is 
the momentum. 


A (3.75) 


Including the time dependence (3.18), we have 


1 i 

i(k-x—wpgt) 
—— e ; 3.76 
Ww a 
where Hk = hw,. This is a traveling plane wave, propagating in 
the direction k. From (3.57) the wave vector k and the angular 
frequency wx are related by the energy-momentum equation as 


_ hk? 
2m’ 
where k? = |k|? = k-k. This is called the dispersion relation. The 
wave propagates with phase velocity vp given by 
ER Wk _ hk 
Pk WM 
Evidently, states with higher k (shorter wavelength) propagate 
faster than states with lower k. 


U(x, t) = ux(x) e` rt/h — 


Wk (3.77) 


(3.78) 
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Following the analysis of the preceding section, we define a state 
function 


U(x,t) = So an dn(x,t) 


Vak u_(X) et Het/h 
k 


1 ; 
= a Y ap eH) (3.79) 
k 


where the summation over k represents a summation over all pos- 
sible values of (ns, ny, nz). As previously, the probability that a 
single precise measurement of the total energy yields a specific 
value Hy, is given by |a,|?. Following the procedure of the earlier 
analysis, we find 


. 1 ‘ 
Se) a a ie, 3.80 
ak =€ T a x U(x, t)e (3.80) 
where the integral is over the cubic volume V. 


Next we define a function ®(k,t) which satisfies 


Ak 1 dV, 
= (k, t). 3.81 
T= n an At) (3.81) 
The reason for this precise definition will become clear shortly. 
Substituting this into (3.79) and making use of (3.68), the state 
function is 


1 dV, PERA 
(x,t) = =P aN Kolk t) kx int), (3.82) 


We now consider the limiting case where the cubic volume V is 
taken to be very large. According to the preceding arguments, the 
lattice of eigenstates in k-space becomes very dense. In this case 
the sum can be represented by the integral 


U(x, t) = i (Pk (k, t) ewe), (3.83) 


oni 
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where d°k is the volume element in k-space corresponding to one 
state (dN = 1). Separately from (3.68, 3.81), 
ak _ (2r)? 
JV V 
where the integral is over the cubic volume V. Again taking the 
limit of V very large, this becomes equivalent to 


D(k, t) = et J, dx U(x, t)e™™*, (3.84) 


(k, t) = J dx U(x, t) eT Exod (3.85) 


aa 
where the integral is now over all space. In order for this integral 
to converge, it is necessary that the state function falls to zero 
for very large x. This is equivalent to saying that the particle is 


localized over some finite region of space. 


The physical significance of ®(k, t) can be appreciated by forming 
the integral 


fexowor = [ak lo Z emt f dèx W(x, t) aa 


1 iw T ik-x! 
aaa’ Mf dx! U(x’, t) e™ | 


(3.86) 
Rearranging the order of integrations, this is equivalent to 
[ao k, t)| = foxu (x,t) [ex Ux, t) 
1 ; ; 
Cher). (3.87 
| (2r)? / r ey 


We recognize the quantity in square brackets as the Dirac delta 
function, namely, 


6(x—x’) = Q ke), (3.88) 
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Making use of the property of the delta function, this leads imme- 
diately to 


J koko = [ax W(x, t)? = 1. (3.89) 


From this we interpret |®(k, t)|? as the probability density in k- 
space, and ®(k,t) as the state function in k-space. Recalling that 
the momentum is p = hk, it follows that ®(k,t) describes the 
state in momentum space. From (3.83) and (3.85) we see that the 
state functions U(x,t) and ®(k,t) are related by a Fourier trans- 
form with respect to the spatial variables, but not with respect to 
time. The result (3.89) is a general property of Fourier transforms 
known as Parseval’s theorem. 


As a further example of free-particle propagation, we consider 
two sources at x = +00, which radiate in phase with each other. 
By the earlier analysis, this gives rise to two individual free-particle 
eigenstates, with normalized eigenfunctions given respectively by 


pmt) = Sp explil+ke = wt) 
Ch eer EE (3.90) 


VL 


where hk is the momentum and hw is the energy. The problem is 
defined on the interval —L/2 < x < +L/2. Consistent with the 
earlier analysis, we assume periodic boundary conditions, where k 
takes on discrete values k, = 27n/L, which approach a continuum 
as L approaches infinity. These two plane waves propagate in op- 
posite directions. The combination of these waves is represented 
by the superposition state 


U(x, t) = — (pa +Y). (3.91) 


5 
Substituting, this is 
2 —iwnt 
(x,t) = T cosh). er", iad Yee ell E2 
(3.92) 
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The resulting intensity is 
| W(x, t)|? = = cos? (knz). (3.93) 


We notice immediately that the time has dropped out, thus form- 
ing a standing wave. This satisfies the normalization condition 


L/2 
J dx |V(a,t)|? =1 (3.94) 
aio 
for all wave numbers k, = 27n/L. The intensity is plotted in 


Figure 3.1 for the case n = 2. The intensity exhibits bright and 


-L/2 0 + L/2 


Figure 3.1: Standing-wave fringe pattern for counterpropagating 
plane waves. 


dark fringes, indicating constructive and destructive interference, 
respectively. The spatial period of the fringes is inversely propor- 
tional to kn, which can take on a multiplicity of values. 
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Experimentally the intensity distribution is a property of the two 
sources, i.e., the way in which the system is prepared. The two 
sources are said to radiate coherently, because they have a def- 
inite, constant phase relationship to one another. This situation 
can be realized experimentally by impinging a parallel beam on a 
positively charged fiber in otherwise field-free space. The beam is 
bent toward the fiber on both sides, thus creating a region where 
the beams from the two sides interact coherently with each other. 
Such an arrangement is called a biprism, and has been demon- 
strated. We have seen from this example that a single-particle 
state with just two individual eigenstates populated shows strik- 
ing and distinctive interference behavior. 


Problems 


1. Write down explicit expressions for the normalized eigenfunc- 
tions, eigenvalues, and dispersion relations for a free particle in 
one Cartesian dimension, assuming periodic boundary conditions 
on a spatial interval of length L. 


2. Estimate the quantum number n, for a free electron with energy 
1 keV moving in a drift length L = 1 m. The correspondence prin- 
ciple states that quantum mechanical particle motion approaches 
classical behavior in the limit of large quantum numbers. 


3. The state function V(x,t) is said to describe single-particle 
motion in the energy representation, since the eigenvalues of the 
Hamiltonian operator H represent conserved energy. The state 
function ®(k, t) is said to describe the momentum representation. 
Write down explicit expressions for the eigenfunctions and eigen- 
values for a free particle in the momentum representation. 
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3.1.3 Wave packet propagation and the Heisen- 
berg uncertainty principle 


From (3.79) the state function V(x, t) represents a superposition 
of indivudual plane waves propagating in space and time. Each 
plane wave is described by a single eigenfunction %g(x, t) with well- 
defined wave vector eigenvalue k and angular frequency eigenvalue 
wy. All of the individual plane waves interfere with one another to 
form a wave packet. This describes the propagation of a single 
particle. We can gain an intuitive feel for this by considering one 
spatial dimension. The eigenfunction for a single state k is 


1 : 
prle, t) = T eilet), (3.95) 


Assuming the wave packet consists of individual eigenstates which 
are close together in energy, there is some central (ko, wo) for which 
the waves interfere constructively. This is represented by an ex- 
tremum condition 


d 
E (ka — wt) | aa 0, (3.96) 


where the derivative is evaluated at (ko, wo). Performing the dif- 
ferentiation, we find 


x dw 
SoS | a 3.97 
t | dk |. l ) 


where the left side is the velocity of propagation. We thus define 
the group velocity as 


dw 
Ug = Bi : (3.98) 
0, WO 
Generalizing this to three dimensions, this is 
Vg = [Vk wlk) lko, vo ° (3.99) 


We see from the dispersion relation that 
_ hko 


m 


(3.100) 


Vg 
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Since the numerator is the momentum, we identify the group ve- 
locity with the classical particle velocity. In this sense, the motion 
of the group corresponds to the classical particle motion. 


To further illustrate the significance of the state function, we con- 
sider a particular example. We assume that the system can be 
prepared experimentally, so that U(x,0) takes the form 


1 Pe: 1/2 
—— exp | -=> 3.101 
aa aon 
in one dimension. This is intentionally constructed so that 
|W(x,0)|? is a Gaussian distribution, and the integral over —oo < 
x < œ is unity, as required for a probability distribution. This 
is often referred to as a Gaussian wave packet. The quantity ø is 
known as the standard deviation, and is a measure of the width of 
| W(x, 0)|?. We therefore define the uncertainty in the x-coordinate 
as 


W(x, 0) = | 


Ag =o. (3.102) 


Next, we form ®(k) in one dimension. This is 
1 f% . 
®(k) = = J dx V(x, 0) e”, (3.103) 
T J—oo 


Substituting for Y (x,t), we find 


1 1 OS 2,2 _- 
(k) = — aon | aa? omike 3.104 
(e) Qn (2ro?)!/4 Re . l ) 
where we have defined 1 
2 
— 3.105 
j 40? ( ) 
From tables, 
1 O9 2,2 ik 1 k? 
== Tere *Y — —_ exp | ——— ]. 3.106 
Qn foe i av 2 p ( 5) ( ) 


(3.107) 
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The standard deviation is immediately recognizable as 


1 1 
ON i AE n, f 
k=a oe N (3.108) 
Equivalently, 
Az Ak = 7 (3.109) 
Recalling the momentum p = hk, it follows that 
Az Ap = $h. (3.110) 


This is an example of the Heisenberg uncertainty principle, which 
states that one can never know the position and momentum si- 
multaneously to a precision better than this. 


Given that the state function U(x, t) depends on the experimental 
conditions, we now return to the question of how one goes about 
preparing a system experimentally. We described a practical ap- 
proach to preparing a state consisting of a monochromatic plane 
wave. As a further example, we now discuss the preparation of a 
state consisting of a wave packet. We consider an electron beam 
emitted from a thermionic (hot) source, and accelerated through 
a potential difference oo. The electrons in the beam have a spread 
of energies of the order AH = kT, where k is Boltzmann’s con- 
stant, and T is the absolute temperature of the electron source. 
The energies are distributed about a central value Hp = edo. From 
the energy-momentum relation, we have 


hk = V2mH. (3.111) 


Taking the differential of both sides, we find a spread of momentum 


m m 
Ap = hAk = ,/—~ AH = kT. 3.112 
7 V2H 2€ bp el) 
Regarding the wave packet as Gaussian, and invoking the uncer- 
tainty principle, this leads to an uncertainty in the position of the 
particle along the beam axis given by 


h h eo 


A ja emd 
7 2Ap kT \ 2m’ 


(3.113) 
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where we can regard Az as the extent of the wave packet. 

In summary, the absolute square |W(x,t)|? of the state function 
is the probability density that any single measurement will find 
a single particle at position x at time t. As such, this quantity 
has primary physical significance. The state function, in turn, is 
a linear superposition of individual eigenfunctions ~,;(x,t), with 
the coefficients a; determined by the way in which the system is 
prepared experimentally. The eigenfunctions ~; with associated 
energy eigenvalues £; represent solutions to the Hamiltonian op- 
erator equation, which, in turn governs the dynamical behavior 
of the particle. Any single measurement of a single particle must 
find the particle in one, and only one eigenstate. In particular, a 
single measurement of the particle energy yields one, and only one 
eigenvalue H,;. The probability of finding the particle in the jth 
eigenstate is |a,|?. 


Problems 


1. An electron beam is accelerated to an energy of 1.0 KeV. The 
beam is then made to pass through an energy filter which trans- 
mits only electrons with a spread of energies AE = 0.025 eV 
about the mean energy. Estimate the uncertainty in arrival time 
of a single electron at a point just at the exit from the energy filter. 


2. Electrons are emitted from a cathode and accelerated to form a 
beam. Describe in words the conceptual relationship between the 
macroscopic beam properties (current, energy, energy spread, path 
length, and transit time) and the quantum mechanical motion of 
a single beam electron. Assume the beam electrons do not interact 
significantly with one another. 
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3.1.4 The quantum mechanical analog of Fer- 
mat’s principle for matter waves 


Fermat’s principle describes the propagation of light, or more gen- 
erally electromagnetic radiation. It states that light propagates 
along a path which minimizes the transit time. Mathematically 
this is equivalent to the statement that every physically allowable 
ray satisfies the condition that the line integral of the index of 
refraction n is stationary with respect to infinitesimal variations. 
This is A 

ô nds = 0. (3.114) 
The index of refraction n is a property of the medium through 
which the light propagates. It is defined as 


ae (3.115) 
where c is the speed of light in vacuum, and vy is the phase velocity 
of propagation in the particular medium. In general n can vary 
from point to point in the medium, and is therefore a function 
of position. In vacuum v, = c and n = 1. Since c is a constant, 
Fermat’s principle can be written in the alternative form 


ds 
ae = (3.116) 


Taking v, = ds/dt, this says physically that the path chosen by 
the light ray is the one for which the propagation time is an ex- 
tremum. In fact, the propagation time is a minimum. 


According to an analysis by Fermi [27], a quantum mechanical 
analogy with Fermat’s principle exists, which describes propaga- 
tion of a single particle. A general property of propagating waves 
says that the phase velocity can be written as vp = vA, where v 
is the temporal frequency, and A is the wavelength. In the case 
where the electromagnetic potentials have no explicit time depen- 
dence, the total energy is conserved. The conserved total energy 
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is H = hv, from which it follows that v is a constant. In this case 
Fermat’s principle is equivalent to 


ds 
aJ Z 20: (3.117) 


Physically, all hypothetical rays in an infinitesimal neighborhood 
surrounding the physical ray interfere constructively. In this con- 
text, Fermat’s principle is fundamentally wave-mechanical. 


Separately, the physical trajectory of a classical point particle with 
mass m obeys the principle of least action, 


5 f pas Si (3.118) 


where we have assumed that the magnetic vector potential A is 
zero, and the electrostatic potential (x) has no explicit time de- 
pendence. Substituting for the kinetic momentum p, this is equiv- 
alent in one dimension to 


ö f 2m[H — U(2)] ds = 0, (3.119) 


where U(x) = q(x) is the potential energy, and H is the con- 
served total energy. The integrand can be regarded as an index of 
refraction in the mechanical analog of Fermat’s principle in clas- 
sical mechanics. Thus we have two alternative expressions for the 
index of refraction. They are not equivalent, since one is wave- 
mechanical, and the other is derived from classical mechanics. 


Following Fermi, equations (3.116) and (3.119) guide us to form a 
working assumption, namely, the phase velocity v, can be written 
in the analogous functional form 


— = f(w) yH (w) — U (1), (3.120) 
where f(w) and H(w) are arbitrary functions of the angular fre- 


quency w, yet to be determined. We will investigate the validity 
of this assumption in the following. 
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The phase velocity vu, is given quite generally by 
Ww 
Re 
where k is the wave number given by k = 27/X, and w is the 


angular frequency given by w = 2zv. Substituting above, this 
gives 


Up = (3.121) 


k = w f(w) y H(w) — U (zx). (3.122) 


This represents a dispersion formula, relating the wave number k 
and the angular frequency w. 


To this point we have regarded k and w to be fixed, with each tak- 
ing on a single value. In practice, the quantum mechanical state 
consists of a superposition of multiple eigenstates, with each state 
characterized by a unique value of k and a unique value of w. This 
superposition represents a wave packet, which propagates with a 
group velocity v4, given (3.98) in one dimension by 


dw 

Up E 

dk 

The derivative is evaluated at central values of k and w, for which 


all partial waves associated with the individual eigenstates inter- 
fere constructively. From (3.122) and (3.123) this gives 


1 dk 


Ug dw 


= VHO) -UE + [u fu)| +u fo) m E 


2,/H(w) — U(x) dw 


(3.123) 


(3.124) 
At this point we make a further working assumption, namely 
d 
T [w f(w)]=0, (3.125) 


which we will proceed to vindicate later. It follows from this that 


w flw) = const. (3.126) 
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According to the correspondence principle, the group velocity vg 
tends to the classical particle velocity in the limit of high quantum 
numbers. The classical kinetic momentum p above is then replaced 
by mv, in the limit. Following Fermi, this prompts us to further 
assume independently that 
Ta z (3.127) 
YM — \/(2/m) [H(w) - U(2)] 


where we have replaced the classical conserved total energy H with 
the undetermined function H(w). Equating the two expressions 
(3.124) and (3.127) for 1/v, with the condition (3.125), we find 


that 
dH  v2m 
dw wf (w) 
It can be shown experimentally, by electron diffraction by crystals, 
for example, that the constant on the far right must be h. This 


leads to 
H = hw, w f(w) = a (3.129) 


The total energy eigenvalue H is determined to within an arbitrary 
additive integration constant. The equation on the right vindicates 
our assumption (3.126) that w f(w) = const. Substituting above, 
this leads to 


= const. (3.128) 


2m [hw — U(x)] 
k= 5 


(3.130) 


Equivalently, 
B R? k? 


m 


hw + U(x). (3.181) 


We recognize this as the dispersion relation resulting from conser- 
vation of total energy, where hw is the total energy eigenvalue, and 
hk is the momentum eigenvalue. Both w and k are evaluated at the 
central values which characterize the wave packet or superposition 
state. 


This analysis shows that the quantum mechanics of single-particle 
propagation can be described in terms of a variational principle 


3.2. Particle motion in a general electromagnetic potential 157 


and the resulting dispersion relation. This is completely equiva- 
lent to the postulate-based approach described earlier. Physically, 
a close analogy exists with the wave optics of light propagation. 


3.2 Particle motion in a general elec- 
tromagnetic potential 


In the preceding sections we have reviewed the conceptual basis of 
quantum mechanics, as it relates to the motion of a single particle. 
We are now in a position to include electric and magnetic effects. 
For the present purpose, we confine our attention to fields which 
vary slowly in space and time. This permits a more traditional 
approach, based on Schrodinger theory. To this end, we consider 
all relevant information about the electric and magnetic effects 
to be contained in the electrostatic scalar potential $(x,t) and 
the magnetic vector potential A(x,t), respectively, where these 
potentials are functions of position x and time t. Together, these 
potentials form the components of a Lorentz-covariant four-vector 
(2.5), which we refer to as a general electromagnetic potential. 
The central problem is to solve for the wave function y(x, t) in the 
presence of a general electromagnetic potential, where the absolute 
square of this function is the probability density for finding the 
particle at position x and time t. 


3.2.1 Path integral approach for the time- 
dependent wave function 


A great deal of physical insight can be gained from the path inte- 
gral description of quantum mechanics, originally due to Feynman 
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[28]. The reader is referred to an emended version by Styer of the 
original text by Feynman and Hibbs [29] for a detailed, compre- 
hensive, and highly readable description. Our present goal is to 
summarize the highlights of the text, with particular application 
to the motion of a single charged particle moving in a general elec- 
tromagnetic potential. 


We begin by studying the motion of a particle in one dimension, 
where the position x is a function of time t. Classically, the motion 
from an initial time ta to a later time t, is along that path which 
represents an extremum of the action integral Sba given by 


ty 
el L(x, v; t) dt, (3.132) 
ta 


where L is the Lagrangian given by (2.9). The beginning point 
charcterized by position and time (£a, ta), and the end point char- 
acterized by position and time (z+, tp) are assumed to be fixed. 


In a quantum-mechanical description we seek a probability am- 
plitude Ypa for the particle to propagate from an initial position 
and time (Za,ta) to a final position and time (xp, tẹ). Again we 
assume that the end points (£a, ta) and (2p, tb) are fixed. 


At this point we form a key hypothesis, namely, the amplitude 
Pba Can be written for a given path of motion as 


Pba = const X exp | Sto | ; (3.133) 
where Sba is the action integral. An infinite number of possible 
paths exist, each path having a distinct value of Sva. This is illus- 
trated for a hypothetical system in Figure 3.2. The general rule 
in quantum mechanics is that the amplitudes for all alternative 
paths must be added to form the resultant amplitude. The abso- 
lute square of this resultant amplitude then represents the proba- 
bility density for finding the system at a given coordinate x. The 
amplitudes Yo. for all possible paths must therefore be summed to 
form the overall probability amplitude K (z+, to; Za, ta). Following 
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x(t) 


(x,t) 


(x, 1) 


Figure 3.2: Possible paths between two fixed points in one spatial 
dimension. 


Feynman and Hibbs, we will refer to this overall amplitude as a 
kernel, and the summation over all possible paths as a path integral. 


The absolute square of the kernel represents the probability den- 
sity Ppa for finding the particle at position x, at time tp, having 
started at position x, at an earlier time t,. Equivalently, 


Pra = | K (2p, tp; La, ta) |?. (3.134) 


For a charged particle optics system of macroscopic dimensions, 
the action integral Sba is very large compared to A. A small vari- 
ation in path therefore results in a large variation in the action 
integral divided by h, or equivalently in the phase of gpa. The 
classical path of motion is labeled 1 in the figure, with the path 
shown as bold. According to Hamilton’s principle of least action 
(2.8), the action integral Sba is stationary with respect to first- 
order variations about this classical path. Consequently, all paths 
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2 in the immediate vicinity of path 1 have approximately the same 
phase for Yq. The waves for these paths therefore interfere con- 
structively. 


For paths which are remote from the classical path, a small varia- 
tion in path leads to a large variation in phase. This is represented 
by paths 3 and 4 in Figure 3.2. The phase factor in the expres- 
sion for Ypa oscillates rapidly for these paths. The sum over these 
paths therefore is very close to zero on average. Only paths in the 
immediate vicinity of the classical path contribute significantly to 
the overall amplitude K (2p, ty; Za ta). Quantitatively, the action 
integrals Sba for the two nearby paths 1 and 2 can at most differ 
by a reasonably small fraction of h for constructive interference to 
occur. 


Further physical insight can be gained by noticing that, for any 
single path 
Poa = Poc ` Pea (3.135) 


where (£e, te) is any intermediate space-time point along the par- 
ticular path. This arises from (3.132), which leads directly to 


Sba = Sbe + Sea: (3.136) 


This is depicted for a hypothetical system in Figure 3.3. The mo- 
tion can be decomposed into a path from (£a, ta) to an intermedi- 
ate point (£e, te), followed by a path from (ze, te) to the end point 
at (£y, tp). Since the kernel K (qp, ty; Za, ta) is the integral over all 
possible paths, it follows that 
K (£r, to; Za, ta) = J K (£v, to; Zo, te) K (Zo, te; Zas ta) dzo. 
(3.137) 
This represents the motion between a specific starting point 
(Za,ta) and a specific end point (2p, tẹ). 


For many purposes it is sufficient to know the state of the system 
at a given end point (2», tẹ), without regard for the prior history of 
how the system got there. To this end we define the wave function 
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x(t) 


(x,, 4) 


(x,t) 


(x,t) 


Figure 3.3: Evolution of possible paths through an intermediate 
point. 


W(Xp, ty) as 
w(xp, to) = K (£v, to; Za, ta), (3.138) 


that is, we simply ignore the fact that the motion started at a 
particular point (£a, ta). Taking this assumption into account, it 
follows that 


W(Xp, to) = if K (£p, to; Za, ta) P(La; ta) dTa, (3.139) 


—oo 


where we have relabeled the indices. 


Incidentally, this concept can be extended to any number of in- 
termediate points, including a large number of points spaced in- 
finitesimally close to one another. This leads to a method of more 
rigorously evaluating the path integral. The reader is referred to 
[29] for the mathematical details. 
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In words, (3.139) states that the wave function at any given point 
in space-time represents the summation over all possible prior his- 
tories. In addition, it follows from (3.134) that the probability 
density P(x», tẹ) for finding the particle at position x, at time ty 
is given by 

Pay. to) = | w(x», to) ge (3.140) 


Given this, it is of great interest to explore the evolution of the 
wave function in a differential sense, where the end time t, differs 
from the initial time ta by a differential time interval e. By this 
method we set out to derive a differential equation which describes 
the evolution of the wave function ~(z,t) in space-time. Applying 
(3.139) it follows that 


oO 


p(x, t+ e€) = J K(x,t + €; £a, t) O(a, t) dza. (3.141) 


Because this represents an infinitesimal increment in space-time, it 
follows that virtually all of the contribution is due to paths in the 
immediate vicinity of (x,t). We therefore make the substitution 
Za = £ +N, where 7 is a small increment in position relative to x. 
Substituting into (3.141) we obtain 


yp(x,t+e)= i K(a,t+ec+n,t)v(at+n,t)dn. (3.142) 


The kernel K is given to good approximation by 
1 ; pte 
K(x,t+ex+m,t) => exp B Hent ar | (3.143) 
t 


where A is a normalization constant, yet to be determined. For 
the infinitesimal integration interval this in turn reduces to 


1 
K(x, tte r+, t & a exp| reL («+2 2) 


where the first argument of L is the position and the second argu- 
ment is the velocity, both averaged over the infinitesimal integra- 
tion interval. The nonrelativistic approximation for the Lagrangian 
(2.7) is 


L(a,v;t) = mv? + qv - A(x, t) — ¢(z, t). (3.145) 


, (8.144) 
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We assume for now that A = 0 for the magnetic vector potential. 
Substituting, the wave function becomes 


1 fe imn? 
yp(z,t+e) = eo = | 
e a ae 
(3.146) 


where most of the contribution to the integral is for small values 
of 7. Next we expand w(z,t) in a power series to first order in € 
and second order in 7. This gives 


o&p 1 > imn? i 


ð n 
i [ven | noe | T =] dn. (3.147) 


Equating the leading terms on both sides, we must have, to zero 
order in € 


wat) = whet) 3 [ew RE | ay 


2ħe 
1 [2riħe\ 
= t)-— ; 3.148 
wen: 5 (2) (3.148) 
Consequently, the normalization constant A is given by 
ge dD 
2 

A= ( mine) (3.149) 

m 


Continuing to evaluate the right-hand side of (3.148), we make use 
of the two integrals 


1 f% imn? 
JRE C 


1 > 3 imn? _ the 
zf exp| aie | dn = PER (3.150) 
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Substituting, we obtain 


a ) ihe 0 
V(a,t)+e 5 Ula, t) = v(@,t)—Feqo(e,t) ole t 55 Ua, 8). 
(3.151) 
Equivalently, 
R 0 a 
Sm pa + IPC *) | Vat) = th 5 Ole, 1). (3.152) 


We recognize this as the time-dependent Schrodinger equation 
(3.13) for one spatial dimension. Since the wave function y(x, t) 
is itself a kernel, it follows that the kernel K (2p, ty; Za, ta) satisfies 
Schrodinger’s equation as well. It is straightforward to generalize 
the above arguments to three spatial dimensions, in which case 
one obtains the full time-dependent Schrödinger equation (3.13) 
in three spatial dimensions. 


This shows the connection between the path integral approach 
and the more traditional approach. It also vindicates our initial 
choice for the form (3.133) of the amplitude Yq. For a charged 
particle optics system of macroscopic dimensions, only paths in- 
finitesimally close to the classical path of motion, including the 
classical path itself, contribute significantly to the wave function. 


It should be added in this context that the path integral approach 
is quite general, and applies to systems of atomic dimensions as 
well as systems of macroscopic dimensions. In an atomic system, 
the path integral is of the order of h for all paths. Consequently, 
all paths must be included in the path integral. This highlights 
the simplification which is possible for a charged particle system 
of macroscopic dimensions. 


In most cases it is simpler to solve a differential equation than 
to perform the path integral. In the next section we investigate 
solutions to the Schrodinger equation for a single charged particle 
in a general electromagnetic potential. 
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Problems 


1. Show that, for a free particle in one spatial dimension, the clas- 
sical action integral Sva given by (2.10) can be evaluated in closed 
form in the non-relativistic approximation as 

M (£p — La)? 


sei cali Sek ete 3.153 
ce 2 ht ( ) 


2. For a free particle, the kernel Ko(£», tb; £a, ta) can be evalu- 
ated in principle by subdividing the interval (£p, ty; Za, ta) into N 
subintervals of equal time step €. Summing over all possible paths, 
this leads to 


P 1 Ube 
Ko(Tv, to; Las ta) _ imz f Eai fel h 
f dzı dx» dxn—4 (3.154) 


a i ae 
where A is given by (3.149), and Sba is the free-particle action 


integral from the preceding problem. Show by repeated integration 
that the free-particle kernel Ko can be expressed in closed form as 


m T im(xp — Ta)” 
Ko(Te, to; Za, ta) = Te — ta) exp | —————— |. 


Note that the probability density Ppa that the particle arrives at 
(zp, ty) is proportional to the absolute square of the kernel Ko. 
This is 

m 
~~ 2mA(ty — ta) 


(Hint: the integral of a Gaussian function is also a Gaussian func- 
tion. See [29, page 42] for detailed discussion.) 


Peel Be, to; Za, ta) (3.156) 


3. Show that, in three Cartesian dimensions, the wave func- 
tion Y(X», tẹ) satisfies the three-dimensional time-dependent 
Schrödinger equation (3.13). (Hint: this is a straightforward gen- 
eralization of the derivation for one dimension.) 
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3.2.2 Series solution for a particle in a general 
electromagnetic potential 


The central problem in this section is to solve for the single-particle 
wave function w(x, t) in three spatial dimensions, in the presence of 
a general electromagnetic potential with components A(x,t) and 
o(x,t). All relevant electromagnetic effects are contained in these 
potentials. All relevant quantum-mechanical information about 
the particle is contained in the wave function w(x, t). 


Applying (3.4-3.13) we write the generalized time-dependent (non- 
relativistic) Schrödinger equation as 


[—ihV — q A(x, t)? W(x, t) + q(x, t) y(x, t). 
(3.157) 
Applying the square bracket twice in succession, we obtain 


0 1 


[-ihV os q A(x, t) P U(x, t) 
= [-WV? + 2iħqA -V + iħq (V - A) + PA? | V(x, t). 


(3.158) 
This leads to the wave equation 
oO 
th i w(x, t) 
1 
= > | -° V? + 2ihgA - V + ihq(V-A)+ PA? | v(x, t) 
m 
+9 U(x, t), 
(3.159) 


which immediately reduces to the time-dependent Schrodinger 
equation (3.11) in the limit A = 0 where no magnetic effects are 
present. 


We seek a form for the wave function ~(x,t) which approximates 
a free particle in the case where the electromagnetic potentials 
A(x,t) and ¢(x,t) are slowly varying in space-time. To this end 
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we assume that the solution y(x, t) can be expressed in the form 


(x,t) = exp [$ Sx ‘)| ; (3.160) 


where S(x,t) has yet to be determined. There is no loss of gener- 
ality, assuming S(x,t) is complex, and remembering that w(x, t) 
can be multiplied by an arbitrary normalization constant without 
affecting the validity of the solution. We can write down a few 
useful identities as follows: 


1 


Vy = $ (VS) 

Vey = |- sy +i NS yY 

o i oS 

=ý = (; io, (3.161) 


Substituting these into (3.135), it is straightforward to show that 
S(x, t) satisfies 


Os 
(VS — qA} —ihV - (VS —qA)+2m (F + 1) = 05 (3.162) 
The second term on the left is obviously proportional to h. For 
a single particle in an unbound state, we can regard this term as 
small relative to the other terms. (This will be justified later.) In 
this case we can approximate 


(VS — qA)?+2m (F 


at 1) ~o (3.163) 


We immediately notice the striking fact that this is precisely the 
classical Hamiltonian-Jacobi equation of motion (2.313). We con- 
clude from this that the function S(x, t) is approximately identified 
with Hamilton’s principal function. 


Equation (3.162) is nonlinear, and as such cannot be solved in 
closed form. We therefore seek a suitable approximation. To this 
end, we write S(x,t) as an infinite series, 


S(x, t) = So(x, t) + A Sy(x,t) +h? So(x,t)+.... (3.164) 
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Substituting, and collecting terms in the various powers of h, we 
obtain 


as, 
0 [W59 - 9A)? + 2m (Fe +09) | 
l OS, 
2 "2 2 OS» 
+ R? | 2VS2- (VS — GA) = iV’, + (VS1)? + 2m? 
O(n? j: (3.165) 


In order for this series to converge to a sensible result, the individ- 
ual terms must become successively smaller. Physically, we expect 
that the motion must approach the classical motion if we regard 
h to approach zero. Anticipating passage to the classical limit, we 
therefore regard h to be small, but variable. This requires that 
each of the quantities in square brackets must vanish separately, 
thus leading to a set of coupled equations for So, S1, So,.... 


Taking the first equation in the series, we write 


as 
(VSp — qA)? +2m (= ee 16) = 0, (3.166) 


recalling that we regard the potentials (x,t) and A(x,t) to be 
functions of position x and time t. 


Next we seek the solution for So(x, t). Rearranging terms, we can 
write this as 
OSo 


Ot 
The right-hand side is recognizable as the negative of the classical 
Hamiltonian H, where we make the identification 


= (VS — gA)? —q¢. (3.167) 


VS) =P, (3.168) 


and P is the canonical momentum. In this approximation, (3.166) 
is precisely the classical Hamiltonian-Jacobi equation of motion 
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(2.313). We conclude from this that the function So(x, t) is iden- 
tified with Hamilton’s principal function. Based on the expression 
(2.327) for Hamilton’s principal function, we are prompted to pro- 
pose a solution for So as follows: 


t 
So(X, t; Xa, ta) = f L(x, v: t) dt’, (3.169) 
where the right-hand side is the action integral in Hamilton’s prin- 
ciple of least action. Substituting this solution into (3.166) and 
making use of (3.168), it is straightforward to verify that this 
is indeed the correct solution. Furthermore, substituting So into 
(3.160) we recover (3.133) from the path integral approach de- 
scribed earlier. 


Forming the second equation for S4 from (3.165) we have 


OS 
2V S1 - (VSo — gA) — iV - (V So — qA) + 2m =0. (3.170) 
Making use of (3.168) this reduces to 
2V S1 paiV- pt mL, (3.171) 


where p is the kinetic momentum given by p = P — qA. Assum- 
ing the potentials A (x,t) and ¢(x,t) are slowly varying, we can 
approximate this as 


ðS _Op j OS, o 
a a (3.172) 


where s is the coordinate along the path of motion, to which the 
kinetic momentum p is locally tangent. This reduces to 


ðo ma iO 
(2 +2 z) s = 5 (mp). (3.173) 


2p 


This equation can be solved in principle for S1. We assume that 
the further terms in the series (3.165) for S become progressively 
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smaller for an unbound particle. This will be discussed in more de- 
tail later. In principle, we substitute the terms So, S1, S2,... into 
(3.160) to form the wave function 7(x, t). 


We now turn our attention to the important special case where 
the potentials A and @ have no explicit time dependence. In this 
case the potentials can be written as A(x) and ¢(x), respectively. 
The earlier analysis showed that the Hamiltonian has no explicit 
time dependence in this case, from which it follows that the total 
energy H is conserved. According to Hamilton—Jacobi theory, the 
function Sg can be expressed as 


So(x,t) = Wo(x) — Ht, (3.174) 


where Wọ is Hamilton’s characteristic function. Noting that 
VWo = V So, we obtain 


(VW — qA} = 2m (H — q). (3.175) 


We recognize the right side as the square of the kinetic momentum 
[ p(x) ]*, where 

[ p(x) ]? = 2m [H — q(x) ]. (3.176) 
This is satisfied by 


VWo -qA = p(x), (3.177) 


Retaining only the positive (right-propagating) root, and ignoring 
the negative (left-propagating) root, we obtain 


VWo = P(x), (3.178) 


recalling that P = p + qA is the canonical momentum. Integrat- 
ing, we obtain 


Wo (t,t) — Wola, ta) = f P- dx, (3.179) 


Xa 
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where the integral is a line integral along a path joining the points 
Xa and x». Substituting, this leads to 


Solaa, to) = So(%a,ta) + [ P-dx—H(ty—t,). (3-180) 


Xa 


The second term in (3.165) leads to 


Substituting the solution for So(x,t), this becomes 
p: VS, = 5V -p. (3.182) 


We now assume that the potentials A(x) and ¢(x) do not vary 
significantly over distances comparable with the deBroglie wave- 
length À = h/p. This allows the approximation 
o i o 
LS ep 
Pas”) 2 as? 
where s represents the coordinate along the path of motion, to 


which the kinetic momentum vector p is locally tangent. This is 
equivalent to 


(3.183) 


o i o 
E S ; 184 
sot = 5 Be (BP) (3.184) 
We immediately perform the line integral to obtain 
i p(x) a 
51 (xp) — Si(Xa) = 5 [In p(x») — In p(xa)] =i In bes | ’ 
(3.185) 


Recalling the definition (3.164) for S, and substituting the results 
for Sg and S1, we obtain the solution for the wave function for 
time-independent potentials ¢(x) and A(x) as 
1/2 
P(Xa) | i f*e iH 
T a> “a PE P -dx — — (t — ta 
w(Xp, to) y(x t ) bat exp h a X h ( b ) 
(3.186) 


recalling that H is the conserved total energy. We have ignored 
terms of order h” in the expansion for S, since these are expected 
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to be relatively small for a single particle in an unbound state. 


This solution is known as the WKB approximation [59, 79]. It 
has an important, and still very relevant history in quantum me- 
chanics, in explaining the classical limit of large quantum num- 
bers. For a bound system the approximation breaks down at 
the classical turning points where the kinetic momentum p(x) = 
\/2m(H — qd) = 0. Physically, the WKB approximation applies 
in the case where the fractional change in the electrostatic poten- 
tial ¢(x) is small over a distance comparable with the deBroglie 
wavelength. For an unbound system at relatively high energy, the 
approximation is excellent. In the free-particle case where the po- 
tentials are zero everywhere, this solution reverts to the familiar 
plane wave solution as required. 


It is important to remember that this solution applies to a multi- 
plicity of paths, of which the classical trajectory is just one. These 
must be summed to obtain the overall wave function. This involves 
a procedure similar to (3.139), adapted to three dimensions. In 
most charged particle optical systems, the action integrals in the 
solutions (3.169, 3.186) are very large relative to h. We showed 
in the preceding section that only trajectories infinitesimally sep- 
arated from the classical trajectory, together with the classical 
trajectory itself, contribute appreciably to the overall wave func- 
tion. In this case it is a very good approximation to assume that 
the action integrals are applied only along the classical trajectory. 


This can be further understood by applying the operator for the 
canonical momentum P to the wave function (3.186). This gives 


—ihV u(x,t) = P y(x, t). (3.187) 


Geometrically, this means that the canonical momentum vector P 
is perpendicular to the surfaces of constant phase. The kinetic mo- 
mentum vector p is everywhere tangent to the classical trajectory. 
In the presence of a magnetic vector potential A, this gives rise to 
a geometrical interpretation as shown in Figure 3.4. 
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Figure 3.4: Particle momentum and the surfaces of constant phase. 
The probability current j(x,t) is given by 


j(x) = 2 wyg- 0v), (3.188) 


where the time drops out, as required in the case where the poten- 
tials have no explicit time dependence. Substituting (3.187) the 
reader can immediately deduce that 


P 
jæ t) = — by. (3.189) 


This is equivalent to the continuity equation of probability, for 
which the current density equals the velocity (momentum divided 
by mass) times the probability density. In the WKB approxima- 
tion, we see immediately that 


p(x) ww = const. (3.190) 


This interpretation suggests a practical approach to computing 
the wave function w. First, we compute the classical trajectory, 
given a prespecified initial position x, velocity v. Next, we com- 
pute the classical action integral between the point (xa) and any 
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other point (x,). Finally, we divide by h to give the phase, thus 
obtaining the wave function (3.186). Incidentally, this approach is 
accurate in the sense that it implicitly includes all orders af aber- 
rations. 


3.2.3 Quantum interference effects in electro- 
magnetic potentials 


In the preceding sections we investigated wave-optical interference 
which occurs when single free-particle amplitudes corresponding 
to alternative paths of motion add coherently. We now extend 
this discussion to the case where electromagnetic potentials are 
present. We consider a hypothetical monochromatic point source 
of electrons at axial coordinate z,, which coincides with the front 
focal plane of a lens at axial coordinate zz1. This is shown schemat- 
ically in Figure 3.5. A screen with two slits is located directly be- 
hind the lens. The slits are illuminated by a monochromatic plane 
wave in the paraxial approximation. A second lens at axial coordi- 
nate Zz produces a diffraction pattern on a viewing screen located 
at axial coordinate z,, which is assumed to coincide with the back 
focal plane of the second lens. The solid lines correspond to the 
classical rays for the two alternative paths. Bright fringes appear 
where the amplitudes corresponding to two alternative paths add 
constructively. This occurs where the optical path lengths differ 
by an integral number of wavelengths. 


Next we assume a magnetic flux which is entirely confined to the 
cross-hatched circle, where the lines of flux are oriented perpen- 
dicular to the plane of the figure. Such a flux can be produced 
in principle by a very long solenoid with very fine, closely spaced 
windings, where the axis of the solenoid is also oriented perpendic- 
ular to the plane of the figure. We assume that the magnetic field 
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Figure 3.5: Two-slit interference in presence of a magnetic vector 
potential. 


is zero outside the cross-hatched region, and that the flux region 
lies entirely within the geometric shadow of the two slits. It follows 
that the electron experiences no magnetic Lorentz force, since the 
magnetic field is zero wherever significant likelihood of finding the 
electron exists. Strictly speaking, these assumptions can only be 
approximately realized, since the flux lines must follow a return 
path outside the solenoid. The magnetic field can be made arbi- 
trarily small by judicious design of the experimental configuration, 
however. 


The amplitude w(x,, ty) is given in terms of the initial amplitude 
(Xa, ta) by (3.186) 


exp { = | P-ds —H (t = ta)| \ (3.191) 
for the special case where the potentials A(x) and $(x) have no 
explicit time dependence. We assume in the following that p(x.) = 
p(X»); i.e., the kinetic momentum is the same at the start and 
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end points. This is equivalent to the assumption that ¢(x,) = 
@(Xp); i.e., the electrostatic potential is the same at the start and 
end points. The resultant amplitude w(x,,t,) is the sum of the 
amplitudes for the two paths, namely, 


w(x, ty) = Vr(Xs, to) + Yr (Xo, to), (3.192) 


where W(X», ta) and wWy;(x,,t,) are the amplitudes corresponding 
to the upper and lower paths in Figure 3.5, respectively. We can 
write this equivalently as 


(Xe, to) = W(X; ta) (ei + el”), (3.193) 
where we have defined the phases 
1 sx 1 
br = = P,- ds; — — H (ty — ta), 
I his: T° sy h (ty ) 
1 sxe 1 
ir = i Pzr- dsrr = i H (t — ta). (3.194) 
The intensity in the plane of the screen 2, is given by 
I (xp) = [Y (x, to) |?. (3.195) 


It is straightforward to show that this is equivalent to 


I(xp) = 4 I (xa) cos? (7#) i (3.196) 
The time dependence in (3.194) subtracts to zero, corresponding 
to a standing wave. Constructive interference occurs for 077 — 0r = 
2n7, and destructive interference occurs for Orr — 07 = (2n + 1)z, 
where the integer n represents the order. The phase difference is 
given by 


1 
677 — 0r = , f Pods 


= = fp-ds+4 fA- ds, (3.197) 


where the integral is around the closed path. Next we define a 
phase shift A0 corresponding to the difference in phase between 
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the solenoid being excited to a specific value and the solenoid 
current turned off. This is 


Ab = : ga . ds. (3.198) 
Applying Stokes’s theorem we write 


_ 4 , 
Ag = = [V XA) dS 


f8 dS, (3.199) 


where © is any surface bounded by the closed ray paths. Equiva- 
lently, 
® 

Ad = o (3.200) 
where Ẹ is the total magnetic flux enclosed by the ray paths. The 
magnetic vector potential is nonzero in the vicinity of the classi- 
cal trajectories, but the magnetic field is zero there. Therefore, no 
magnetic Lorentz force acts on the electron. This result was first 
predicted by Ehrenberg and Siday [25], and later expanded upon 
by Aharonov and Bohm [2]. 


An electrostatic analog was first predicted by Aharonov and Bohm 
[2]. This is shown schematically in Figure 3.6. An electron travers- 
ing the upper path passes through a conducting tube. When the 
electron is inside the tube, near its center, an electrostatic poten- 
tial V(t) is momentarily applied to the tube by an external source. 
Assuming the length of the tube is much larger than its diameter, 
the electron experiences no electric field during the time interval in 
which V(t) is switched on. Consequently, no electrostatic Lorentz 
force is exerted on the electron. 


This is only possible in principle if the electron is represented by a 
wave packet, rather than a monochromatic plane wave. The wave 
packet must be sufficiently localized for the above condition to 
be met, whereas a monochromatic plane wave has infinite extent. 
This inevitably requires a spread in momentum as well, consistent 
with the Heisenberg uncertainty principle. Measurable intensity at 
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Figure 3.6: Two-slit interference in presence of an electrostatic 
potential. 


zp only occurs when the path difference is less than the coherence 
length of the wave packet. 


Since the electrostatic potential ¢(x, t) depends explicitly on time, 
we must use the time-dependent formulation. The wave function 
(Xp, ty) is given (3.132, 3.145) by 


ot 
Ce Cr [=f bev) at], (3.201) 
ta 
where L(x, v; t) is the classical Lagrangian given by (2.9) as 


L(x, v;t) = —me? 4/1 — v?/c2 + qv - A(x, t) — q(x, t). (3.202) 
The magnetic vector potential A is assumed to be zero in this 
case. The amplitude (xp, tẹ) is again the sum of the amplitudes 
for the two alternative paths (3.192, 3.193), with the respective 
phases given by 


1 ft 
Or = hh : L, dt; 

1 st 
Orr = Ly; dtir. (3.203) 


h Jta 
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The intensity measured in the plane at z% again depends on the 
difference between these two phases. The phase shift between the 
cases with the potential V(t) switched on and off is 


Ag = ; i V(t) dt. (3.204) 


This is independent of the electron energy, and is therefore the 
same for all constituent energies in the wave packet. 


The phase shifts (3.198, 3.204) result in a measurable lateral shift 
in the fringe pattern on the screen at z in principle. The solutions 
(3.191, 3.201) for the wave function have the striking property 
that they depend only on the magnetic vector potential A(x, t), 
and the electrostatic scalar potential @(x, t). Nowhere do the mag- 
netic field B or the electric field E appear. This is distinctly dif- 
ferent from the classical description, in which these fields appear 
explicitly in the Lorentz force law (2.15). Indeed no Lorentz force 
is present in this quantum mechanical description. The reader is 
referred to [3, 87] for further elaboration, including a description 
of experimental results. 


3.2.4 The Klein—Gordon equation and the co- 
variant wave function 


The effects of special relativity become important when the ki- 
netic energy of the particle is comparable to, or greater than mc?, 
where m is the rest mass. A correct treatment must also include 
the effects of spin. The reader is referred to the book by Bjorken 
and Drell [6]. 


In many practical instruments, spin does not play an important 
role in the optics, however. A useful approximation is available in 
the Klein-Gordon equation which ignores spin, but retains Lorentz 
covariance. As before, we confine our attention to the lab frame 
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only. 
Following the arguments of the preceding sections, we assume that 
7 t 
w(x, t) = (Xa, ta) exp B L(x, v;t) dt], (3.205) 
ta 


where L(x, v; t) is the relativistic classical Lagrangian given in the 
lab frame by (2.9) as 


L(x, v;t) = —me? 4/1 — v?/c2 + qv- A(x,t) —¢¢(x,t). (3.206) 


From (2.25, 2.30, 2.31) the classical energy-momentum relation- 
ship is given by 


[H —q¢(x,t)]? =[P—qA(x,t) Pe? +m. (3.207) 


This equation is Lorentz-invariant, since it contains the square of 
the difference of two four vectors (P,iH/c) and (qA, iqġ/c). As 
such, it has the same form in every uniformly moving reference 
frame. Again invoking the fundamental postulate that classical 
quantities are replaced by their quantum mechanical operators, 
this becomes 


(ing. — 10) w(x, t) = (—ihV — gA)? 2 y(x, t) + méy, t). 


(3.208) 
Applying the operator in large parentheses twice in succession, the 
left side is 


ə 2 
Fic ms 

ę 7 10) v(x, t) 

Ow 

ot? 

Similarly, the first term on the a is 


(—iħV -qA W(x, t) 


Aa 


= —f? — — 2iħq P 2 — iħq ae w+ eer. (3.209) 


= -RE VW + 2ihge? A -Vy + iħqe (V-A) Y HPE A? Y. 
(3.210) 
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Substituting, we obtain the relativistic time-dependent wave equa- 
tion as follows: 


np 7 


2 
Te — 21g E — ing Oh + gore 


-h2 
Ot? 
= WE Vw + 2ihqe? A -Vy + tha’ (V- A) wv 


422 Az + mic. 
(3.211) 


Grouping terms and dividing through by c’, we obtain 


-r (v — 3 Se) 2na (A. w+ oe) 


Ot? Ot 
10 1 
+ihq (v Atg 4 ptg (a? - z0) Y 
26°) = 0. (3.212) 


Again, we assume that (x,t) can be written as 


w(x, t) = exp Fa n) | ; (3.213) 


Substituting this into the relativistic wave equation, it is tedious 
but straightforward to show that 


(VS —qA)?-ihV (VS —qA) 


1 (as > in a (OS pe 
Te (a) ETIE +00) a eg, 
(3.214) 


Again we expand $(x,t) in powers of h, in which we define 


S(x, t) = Sol(x, t) +A S(x, t) +A S(x, t) +... (3.215) 
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Substituting and grouping terms according to powers of h, this 
leads after some algebra to 


as ? 
TES (VS) - 4A) -5 (Fe +40) tnte 
1 (339, 3\ [3 
ae È 2 at PE | 16) 
+ O(fh’). (3.216) 


Again anticipating the classical limit where A — 0, we set each of 
the coefficients of the powers of A equal to zero. This leads to the 
coupled set of equations as before. Taking the first equation in the 
series, we have 


2 
(VSo — qA)’ — £ & +q ) +mc* =0. (3.217) 


c2 


We now consider the special case that the potentials are time- 
independent. Again defining a new function Wo(x) given by 


So(x, t) = Wo(x) = Ht, (3.218) 
where H is the constant, conserved total energy eigenvalue. Sub- 
stituting, this gives 

1 
(VWo - GA)’ = 3 (-H +44)" - më, (3.219) 


where we note that VWọ = VSo. We now make use of the rela- 
tivistic energy-momentum relation 


(H- qo) =p'e?+m'ct (3.220) 
to obtain 
(VW — A)? = [p(x) ], (3.221) 


where p is now the relativistic scalar kinetic momentum. This is 
satisfied by 
VW = P(x), (3.222) 
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which we immediately recognize from the relationship (2.25) be- 
tween the canonical momentum P and the kinetic momentum p. 
We have neglected the negative root. This means that we only 
consider motion in the forward direction for the present purpose. 
Integrating, we obtain 


Xp 
So(xs, te) = So(Xa, ta) + f P -dx — H (tp — ta). (3-223) 
We immediately recognize this set as being identical with the non- 
relativistic approximation, except that the relativistic quantities 
p,P, and H here replace their non-relativistic counterparts used 
previously. Noting that 

OS; OS» 


ee eS a 224 
Ot ot a : ) 


the preceding analysis applies, and we obtain the solution for the 
wave function y(x, t) for the case of time-independent potentials 
as 


P(Xa) | = 
p(x) 


exp{ i | - P -dx — H(t, — ta) \ . (3.225) 


w(Xp, to) = bss te) | 


Given an initial condition ~(x,,t,) this describes the propagation 
of w(x, tẹ) to any end point in the presence of static fields, where 
H is the conserved total energy. This represents a single eigenstate 
corresponding to the energy H. As before, individual eigenstates 
are linearly superimposed to build up the state function U(x, t), 
which reflects the experimental preparation of the beam. The mea- 
surable intensity is given by |W(x, t)|?. 


In the free-particle case where ¢ = 0 and A = 0, the equation 
(3.212) reduces to 


(v see “| (x,t) =0. (3.226) 
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This is known as the Klein—Gordon equation. This has non- 
normalized plane wave solutions given by 


w(x, t) = exp 2 (tp-x—Ht)}. (3.227) 


The reader can verify that this is the correct solution direct sub- 
stitution into the Klein-Gordon equation. One can also verify that 
H? = p? + m?c for the free-particle case with ¢ = 0 and A = 0. 
The interpretation of |w(x,t)|? in terms of probability density is 
more subtle than in the nonrelativistic approximation. At modest 
energies, |Y|? remains a very good approximation to the relativis- 
tic probability density, however. The reader is referred to Bjorken 
and Drell [6] for a detailed discussion. 


In summary, all relevant information about quantum mechanical 
particle motion in general, time-independent potentials A(x) and 
(x) is contained in the relativistic wave function (3.225). 


3.2.5 Physical interpretation of the wave func- 
tion and its practical application 


The central problem in optics is to understand the intensity distri- 
bution in a given transverse plane of an optical system. This might 
be the image plane of an electron microscope, the Fourier plane 
of a diffractometer, or the dispersion plane of an energy-dispersive 
charged particle spectrometer, to name a few examples. The inten- 
sity distribution is proportional to the probability distribution for 
finding a single particle at a given position. This in turn is given 
by the absolute square of the wave function, which we have called 
(x,t). In the preceding analysis we have focused on calculating 
the wave function for a single charged particle moving in a general 
electromagnetic potential. The aim of the present section is to un- 
derstand how to translate this into an areal intensity distribution, 
such as one would measure in a practical instrument. 
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In classical geometrical optics, the beam can be regarded as a 
family of closely spaced trajectories. Each individual trajectory 
is calculated by solving the Euler-Lagrange equations of motion, 
which are in turn derived from Hamilton’s principle of least action. 
The collective properties of these trajectories immediately lead to 
conservation of phase space volume, which in turn leads to the law 
of Helmholtz—Lagrange and brightness conservation, as derived in 
Chapter 2. 


In quantum mechanical wave optics, one starts with a surface of 
constant phase called a wave front. This forms an initial condi- 
tion, from which the wave propagates through space-time. This 
is shown schematically in Figure 3.7. At time t the wave front is 


t+ At 


Figure 3.7: Huygens’ principle. 


depicted by the upper curve. At a later time t+ At the wave front 
has propagated, forming a new curve. The wave fronts are actu- 
ally surfaces in three-dimensional coordinate space. In the figure 
we depict a planar slice through the wave front, which is a curve 
on the page. 


We imagine a collection of point sources distributed over the ini- 
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tial wave front, with each point source radiating a spherical wave. 
The point sources are assumed to be infinite in number, and in- 
finitesimally separated along the initial wave front. They are also 
assumed to radiate in phase, or coherently relative to one another. 
At the time t + At the spherical waves have propagated to form 
the envelope of the new wave front. This equivalent picture of 
wave propagation is known as Huygens’ principle. It will prove 
to be indispensible to formulating a mathematical description of 
diffraction, which is derived in later sections. 


Next, we consider each point source to form the initial point of 
a wave function Y(Xa, ta). From the preceding analysis, the fi- 
nal state wave function w(x», tp) is calculated from the solutions 
(3.169, 3.186), depending on whether the electromagnetic poten- 
tials have explicit time dependence or not. The wave functions cor- 
responding to the separate point sources add coherently to form 
the composite wave front. The fact that the composite wave front 
has a specific curvature says that the point sources have a cor- 
responding relative position in space-time, as well as a specific 
phase relationship to one another. This phase advances monoton- 
ically through space-time, as given by the action integral divided 
by A. Each point source has an associated classical trajectory, as 
depicted schemaically in Figure 3.4. 


The mathematical description is exact in principle, and accounts 
for all aberrations. In geometrical optics, the aberrations are man- 
ifest as a displacement of the classical particle trajectory from the 
paraxial approximation. This displacement can be calculated in 
principle to an arbitrary degree of accuracy. In wave optics, the 
aberrations are manifest as displacements in the surfaces of con- 
stant phase. 


The intensity is proportional to the probability density, which in 
turn is related to the wave function as | (x,t) |?. Obviously the 
phase does not appear explicitly here. It is the relative phases of 
neighboring trajectories that govern the shape of the wave fronts. 
As a probability, the wave function for each point source must 
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satisfy 
J Wx H dPx= 1. (3.228) 


This implies a normalization constant multiplying the wave func- 
tion. As Feynman and Hibbs point out [29], there seems to be no 
simple general procedure for calculating this constant. Even for 
the simple case of a free-particle plane wave, we had to resort to 
the device of periodic boundary conditions. Fortunately, the ab- 
solute probability is unimportant here. What is important is the 
relative probability for the various point sources. This determines 
the relative intensity across the beam. This becomes part of spec- 
ifying the initial condition for each point source. 


The fact that each point source radiates a spherical wave is equiv- 
alent to the initial momentum direction being completely unspec- 
ified. From the Heisenberg uncertainty principle, this is consistent 
with the initial point (Xa,ta) being precisely specified for each 
point source. The initial longitudinal momentum is precisely spec- 
ified for each classical trajectory. However, the Heisenberg prin- 
ciple has no classical analog. We must also remember that both 
the classical trajectory and the quantum mechanical wave func- 
tion both apply to a single particle. 


This completes the physical picture which connects the wave func- 
tion to the intensity distribution of a practical system. We are now 
in a position to discuss the intensity distribution for a given practi- 
cal system in a more general way, through the theory of diffraction. 
This forms the topic of the following sections. 


3.3 Diffraction 


Diffraction is the phenomenon which results from the propaga- 
tion, spreading, and interference of waves. In experimental optics, 
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interference is manifest as alternating bright and dark intensity 
bands called fringes. This is a purely wave-optical phenomenon 
with no analog in classical mechanics. Its origin dates back sev- 
eral hundred years to the pioneering work of da Vinci, Grimaldi, 
and Huygens. 


In modern terms, the motion of a single charged particle in an 
electromagnetic potential is described quantum mechanically by 
a wave function, for which the absolute square is the probability 
density that a single measurement will find the particle at pre- 
cise space-time coordinates. A state which is a superposition of 
two or more eigenstates with identical energy, but differing direc- 
tions of momentum exhibits interference. Diffraction and interfer- 
ence are fundamental to a complete description of charged particle 
optics. 


The purpose of this section is to place the concept of diffraction 
on a firm conceptual and mathematical basis, and then, based 
on this, to describe several useful examples. Before embarking on 
this, it is worthwhile to convey an intuitive feel for the subject by 
considering a simple thought experiment, which was described by 
Feynman, et. al. [30, Chapter 1, Volume 3]. This is shown schemat- 
ically in Figure 3.8. We imagine a single charged particle with pre- 
cisely known momentum and energy, incident perpendicularly on 
an opaque screen © with two parallel slits. We assume that the 
transverse position of the particle is completely unknown. Conse- 
quently, the particle could be stopped by the screen, or it could 
pass through one of the two slits. Assuming it passes through one 
of the slits, it is impossible to know which slit the particle passed 
through. Having passed through one of the slits, the particle drifts 
to a phosphor screen P at the bottom of the figure, where a flash of 
light is emitted on impact. This represents a measurement of the 
transverse position of the single particle on the phosphor screen. 
By itself, this measurement does not reveal much information, 
since the particle could land practically anywhere. This is cor- 
roborated by the fact that a second particle generally lands at a 
different place from the first particle. 
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a° 


I(x) 


Figure 3.8: Two-slit thought experiment. 


Next, we repeat this measurement for many, many individual par- 
ticles. This is easily accomplished by forming a beam of particles. 
For sufficiently low beam current, the individual beam particles 
are far enough apart on average that they do not interact with one 
another. One by one, particles arrive at the screen and produce 
a flash of light. This is conceptually equivalent to repeating the 
measurement many times, once for each particle, with identical 
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preparation of the quantum mechanical state for each measure- 
ment. The remarkable result is that bright and dark fringes are 
observed on the phosphor screen. This is represented by the in- 
tensity distribution as a function of transverse position x plotted 
at the bottom of the figure. This result has been observed directly 
for a variety of particle species, indicating that this is more than 
just a thought experiment [91, Page 1068, Chapter 38}. 


Analysis reveals that the bright bands occur where the path length 
difference d sin between the two possible paths equals an inte- 
gral number of wavelengths A. Dark bands occur where the path 
length difference equals a half-odd number of wavelengths. The 
wavelength is related to the particle momentum p by the deBroglie 
relation 


et! 22 
PSr: (3.229) 


where h is Planck’s constant. According to Einstein’s hypothe- 
sis, light propagates in the form of discrete energy packets called 
photons, where each photon obeys this same relationship between 
momentum and wavelength. Indeed, the same two-slit interference 
was observed much earlier for light by Young. This experiment 
and many related topics are authoritatively described by Born 
and Wolf [11]. This is one of many illustrations of the close corre- 
spondence between light optics and particle optics. 


As a related intuitive concept, we next consider the propagation 
of a wave front through space and time, as described by Huy- 
gens’ principle. This will prove to be indispensible to formulating 
a mathematical description of diffraction, which is derived in the 
following sections. It will not be necessary to specifically invoke 
the discreteness of particles. Rather we will take a more traditional 
approach, regarding the wave function as continuous in space and 
time. We will develop a scalar theory, where the optical distur- 
bance is adequately described by the scalar wave function. This is 
permissible, because we consider only particle motion in a vacuum, 
which is inherently isotropic. We will ignore the intrinsic spin, since 
it is not needed for this discussion. The reader is referred in the 
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following to two definitive texts by Born and Wolf [11], and by 
Goodman [36] for detailed and comprehensive discussion. 


3.3.1 The Fresnel—Kirchhoff relation 


In mathematical terms, the central problem in diffraction theory is 
to calculate the amplitude u(x), given specified, known boundary 
conditions. For a free particle this is a solution to the Helmholtz 
equation, given by 

(V? + k*) u(x) = 0, (3.230) 


where k is a constant, and u(x) is the spatial part of the wave 
function. This is precisely the scalar wave equation applicable to 
light, in which case k = w/c, and c is the speed of light. Allowing 
for this, we therefore anticipate that the results to follow are oth- 
erwise equally valid for a photon and a charged particle. In this 
section, we describe a Green’s function approach originally derived 
by Sommerfeld [85] for light optics to achieve this. This method- 
ology is known as the Rayleigh—Sommerfeld solution. The reader 
is referred to the text by Goodman [36] for a comprehensive dis- 
cussion, including the interesting historical attempts to correctly 
understand this problem. 


For the present purpose, we assume the particle propagates freely, 
in the absence of electric and magnetic fields. We begin by stating 
a very general result, which will prove to be useful. We assume two 
arbitrary, complex functions U(x) and V(x), where these functions 
are finite and differentiable over an arbitrary, closed volume 7. We 
form the quantity U V?V — V V2U, and integrate this over the 
volume 7. It follows that 


[luvv -v vu] dr= | V- [UVV -V VU] dr. (3.231) 
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We now make use of the fact that, for any vector field C(x) 
[v-car= [ Cas. (3.232) 
T S 


This expresses the fact that the volume integral of the divergence 
of C is equivalent to the integral of the outward normal component 
of C over the surface S enclosing the volume 7. This general re- 
sult is called the divergence theorem. Applying this to the present 
problem, we find 


f [U V?V (x) — V(x) VPU (x)| ar 


a a 
5 L u 5 V (x) — V(x) Zu) dS, (3.233) 


where the right side is the surface integral over the surface S en- 
closing the volume 7. The quantity n represents the coordinate 
along a direction locally perpendicular to the surface S, oriented 
outward from the volume 7. The partial derivative with respect to 
n is thus the normal gradient of the function. 


The relationship (3.233) between the volume and surface integrals 
is called Green’s theorem. As the functions U and V are arbitrary, 
this result is quite general. We will now proceed to apply it to the 
present problem. 


First we consider the special case where u(x) depends only on 
the magnitude r = |x|. In spherical coordinates, the Helmholtz 
equation is 
1 d 
rae 


This is integrated immediately to give 


[ru(r)] + k’u(r) =0. (3.234) 


ru(r) = exp (+ikr), ie = * exp (ikr), (3.235) 


which represents a spherical wave about the origin r = 0. Multi- 
plying this by exp(—iwt), it is evident that the positive exponent 
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represents an outgoing spherical wave, while the negative exponent 
represents an incoming spherical wave. The positive and negative 
exponentials represent two linearly independent solutions to the 
Helmholtz equation, as required for any second order, ordinary 
differential equation. 


We assume an opaque, planar screen of infinite extent, with one 
or more apertures or openings of arbitrary shape and position in 
the screen. This is shown schematically in Figure 3.9. The screen 


Figure 3.9: Geometry for Sommerfeld’s solution by Green’s func- 
tion. 


is assumed to be illuminated by an arbitrary collection of sources 
(not shown), such that the amplitude at position xp in the plane 
of the screen is u(xo). This amplitude is assumed to be known. 
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Given this information, we wish to evaluate the amplitude u(x) at 
a remote observation point at position x. This point is designated 
by the point P in the figure. This represents the statement of our 
problem in mathematical terms. 


Following Sommerfeld [85], we postulate two point sources at P 
and Q, on opposite sides of the screen, and equidistant from it. We 
imagine a spherical wave emanating separately from each of the 
two points P and Q, where the two waves are assumed to radiate 
exactly 180 degrees out of phase relative to one another. The re- 
sultant amplitude G is found by algebraically adding the complex 
amplitudes corresponding to the two spherical waves (3.235). This 
yields 


1 
G= p PKR) -R exp(ikRı), (3.236) 


where the radii R and R; are shown in the figure. In the plane of 
the screen, R = R,, and consequently, G = 0. This will be cru- 
cially important in the following. 


1 
1 


Because G' is a superposition of two spherical waves, it is immedi- 
ately evident that G satisfies the homogeneous Helmholtz equation 


V’G+hG=0 (3.237) 


everywhere, except at the source points P and Q, where G has 
singularities (3.236). We assume here that the differentiation is 
with respect to the components of x, shown in Figure 3.9. 


We can now write 


| [u(x) V’G(x, xo) — G(x, Xo) V?u(x)| ae 


ð ð 
= f ux) (Xo) — G(x, xo) Žu) dS, (3.238) 


where the left side is an integral over an enclosed volume 7, and 
the right side is an integral over the surface S enclosing the volume 
T. We have made direct use of Green’s theorem (3.233), where we 
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have substituted u for U, and G for V. The volume 7 is depicted 
by the shaded area in the figure. A small sphere about the point 
P is specifically excluded from T, as G has a singularity at P. 
The closed surface S includes the infinite planar screen, the small 
sphere about P, and is closed by a hemispherical surface at infinity 
in the lower half-space of the figure. 


The function G defined in (3.236) is called the Green’s function 
for this problem. The actual specification of G in (3.236) is not 
unique, as any well-behaved function G would satisfy Green’s the- 
orem (3.233). In practice, the choice of G, together with its bound- 
ary conditions, is intentionally made in a way which leads to a 
simplification of the problem at hand, as the following will show. 


Because both u and G satisfy the Helmholtz equation, it follows 
immediately that the integrand on the left side of (3.238), and 
hence the left side itself, is identically zero. It should also be added 
that the sources at P and Q are not physical sources. Rather, they 
are merely a mathematical construct to aid in solving for u. 


The task remains to evaluate the surface integral over S on the 
right side of (3.238). This is equal to the sum of three individ- 
ual surface integrals over the hemispherical surface at infinity, the 
small spherical surface of radius €, and the planar screen, respec- 
tively. Considering first the hemispherical surface S4 at infinity, it 
is straightforward to show that 


ð ð 
n ux) By C Xo) — G(x, Xo) ueo) dS 


, Ou 5 
3 s (isu - =| GR? dQ, (3.239) 


where dQ is the solid angle element. As GR is bounded as R > ov, 
it follows that the right side vanishes, as long as 


R (ik z | 0 (3.240) 
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as R — oo. This is, in fact, the case for a purely outgoing spher- 
ical wave. This is known as the Sommerfeld radiation condition. 
The surface integral over the hemispherical surface S4 at infinity is 
zero. Thus, it makes no contribution to the overall surface integral 
over S, which is the right side of (3.238). 


Next, we consider the small spherical surface S> of radius € about 
P. As e — 0, the integrand on the right side of (3.238) be- 
comes dominated by the first term in the expression (3.236) for 
the Green’s function G. It is straightforward to show that 


a ux) £ ox, Xo) — G(x, Xo) L dS — —4ru(x) 


On On 
(3.241) 
in the limit € > 0. 


Finally, we consider the surface Sọ of the planar screen. We as- 
sume u = 0 on the interior of the opaque portion of the screen, 
i.e., the screen is perfectly opaque. We further notice by symme- 
try that R = R, everywhere in the plane of the screen. It follows 
(3.236) that G = 0 over the entire plane of the screen. In fact, 
this is the reason for Sommerfeld’s choice of the two equidistant 
point sources at P and Q, radiating directly out of phase. The two 
spherical waves from the point sources at P and Q thus interfere 
destructively at the plane of the screen. This leads to a consider- 
able simplification in the evaluation of the right side of (3.238), by 
eliminating the second term in the integrand. Considering the sur- 
faces So (the screen) and So (the small sphere) together, it follows 
(3.241) that 


ð 
4ru(x) = — n dSo u(Xo) meS Xo), (3.242) 


where 


0G ƏGƏR ðG AR, 


ôn OROn ƏR, On 


) 1 
2cos(n, R) ERSA 


R2 


exp(ikR) (3.243) 
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remembering that R = Rı. In the limit of short wavelength we 
have kR > 1, in which case we can approximate 


1 exp (ik |x —x 
ulr, 2) = => f Pro wolro, 20) A al) 


cos (fn, X — Xo), 

(3.244) 
where we have substituted R = |x — xo|. Also, k = 27 /A, where A 
is the particle wavelength, given by \ = h/p. The integral (3.244) 
need only be calculated over the open areas in the screen, where 
Uo(Lo, Zo) is nonzero. Here we have expressed the three-vector po- 
sition x as a two-vector position r in the transverse plane, and 
an axial position z; i.e., x = (r,z). We will continue to use this 
notation throughout. 


|x — Xo | 


The relation (3.244) is known as the Fresnel—Kirchhoff relation 
for historical reasons. It is a general solution of the Helmholtz 
equation, expressed in integral form. It represents an approxima- 
tion, which is only valid in the limit where the wavelength À < R, 
that is, the wavelength is small compared with the viewing dis- 
tance. Within this approximation, the specification of u(ro, zo) is 
quite general. In practice, it depends on the distribution of physi- 
cal sources behind the screen. In the special case where the screen 
is uniformly illuminated at normal incidence from behind by a 
monochromatic plane wave, u(ro, 20) is independent of ro, and 
comes outside the integral as a leading factor. 


The integrand in (3.244) includes an outgoing spherical wave ema- 
nating from the point x9. The integral represents a coherent sum- 
mation of all spherical waves emanating from within the aperture. 
Physically, this is an expression of Huygens’ principle. This in turn 
determines the downstream amplitude u(r, z), given a known am- 
plitude ug(ro, Z0) in the plane of the screen zp. The intensity in 
the plane z is then given by |u(r, z)|?. We will see in the following 
sections that the Fresnel-Kirchhoff equation (3.244) can be used 
in a very practical way to understand the intensity distribution for 
a rich variety of configurations. 
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Problem 


Show by direct substitution that the solution (3.244) satisfies the 
time-independent wave equation (3.230). 


3.3.2 The Fresnel and Fraunhofer approxima- 
tions 


The Fresnel—Kirchhoff relation (3.244), is amenable to numeri- 
cal integration to obtain an exact expression for the amplitude 
u(r, z). In this section we make several approximations which will 
permit straightforward analytical evaluation of the integral. This 
approach allows a more direct physical insight for a large variety 
of interesting cases. Assuming small angles, the ray slope is much 
less than unity, in which case 


cos (n, x — Xo) ~ 1. (3.245) 


We further adopt the simplifying approximation 


1 
|x —x9|= V(r — 10)? + 2 Z+ E +493 — 2-0), (3.246) 


where Z = z — 2 is the drift length. This is often referred to 
as the parabolic approximation, as the spherical wavefront is ap- 
proximated by a parabolic surface for small angles. With these 
approximations, (3.244) reduces to 


1 l r? 
tees ae = az oP j (z + a) 


ik 2 
J Prouolro, 20) exp E (2 —r- o) . 
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This is known as the Fresnel approximation. 


Next we investigate the special case where 


kere 

— 2 3.248 
Vien eee) 
at all positions rọ. Mathematically, the phase shift due to the first 
term in the exponent is negligible. 


Recalling that k = 27/A, this is equivalent to 


wee (3.249) 


where ro/Z is the tangent of the angle subtended on the central 
axis at the end plane. It follows that the first term in the exponent 
can be ignored. In this case (3.247) reduces to 


1 r? 
uļlr,z) = ag oP j (z + a] 
—ikr -ro 


; fèr uo(ro, Zo) exp (==) . (3.250) 


This is referred to as the Fraunhofer approximation. This approx- 
imation is valid for Z sufficiently large, that is, the observation 
plane is sufficiently far removed from the plane of the screen. We 
see from (3.250) that the amplitude u(r, z) is proportional to the 
Fourier transform of uo(ro, zo) with the transform variable kr/Z, 
where r/Z is the tangent of the viewing angle in the observation 
plane. 


The intensity is given by the absolute square of u(r, z). The leading 
phase factor in (3.247, 3.250) drops out in the expression for the 
intensity, and can therefore be ignored. The intensity is directly 
measurable, whereas the amplitude u(r,z) is not. The amplitude 
can only be deduced by measuring the intensity in an interference 
experiment, where the relative phase of the interfering waves is 
precisely known. We therefore ascribe direct physical significance 
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to the intensity, but not the amplitude. 


We have derived a transformation of the wave function u(r, z) 
between successive planes in the drift length of an optical sys- 
tem. To this point we have not assumed any particular symmetry, 
Cartesian, axial, or otherwise. In the following, we will assume ax- 
ial symmetry. This simplifying assumption is applicable to many 
practical systems. 


Next, we wish to incorporate the focusing effects of a lens. This is 
depicted in Figure 3.10, where a thin lens is located at the plane 


j f ` 


Figure 3.10: Path length shift for a thin lens. 


zz. Ideally, rays at all radii r focus to a common point in the plane 
zy. This ideal focusing only occurs for rays close to the optic axis. 
We therefore refer to this ideal focusing as the paraxial approxi- 
mation. Considering the extreme ray, we see by striking a circular 
arc that its path length is longer than the axial ray by a distance 
d. The circular arc coincides with a surface of constant phase for 
a wave converging to the image point. From the Pythagorean the- 
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orem, 


r+ fP=(ft+d) = f+2fd, (3.251) 

where we assume d < 2f. In this approximation we have 
d= —. 3.252 
7 (3.252) 


This gives rise to a phase shift —kd at the plane of the thin lens 
zy. Equivalently, the wave function is multiplied by a phase factor 
given by 


—ikr? 
L(r) = exp( ad ) (3.253) 
for the paraxial approximation (no aberration). 


Using these transformations, we can build up a simple optical 
system. We apply successive transformations, first for the object 
space, followed by the lens, and finally followed by the image space. 
We define 


zo = object plane 
zı = lens plane 
z = recording plane 
Zı = 21 — Z% = object distance 
Zə = z—zı = image distance. (3.254) 


We further denote rg,r;, and r as the two-dimensional position 
vectors in the object, lens, and recording planes, respectively. 


We assume a pupil located at the lens plane 21. By successive trans- 
formations, interchanging the order of integrations, it is straight- 
forward to show that 


=, 2 
u(r, z) = AA exp i (2 + Zo + 5] 


2 


ikrg 
epa ro Uo(ro, 20) exp (=) h(ro, r), (3.255) 
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where we have defined a kernel h given by 


2 
A(ro,r) = per P(r,) exp is 3 (z | 7 = *) 


parz 
1 2 


(3.256) 


where P(r,) is the pupil transmission function, equal to unity in 
the transmitting area, and zero otherwise. 


In the special case where P represents a round aperture cen- 
tered on the optic axis, it is advantageous to use polar coordinates 
r = (p,p). We perform the azimuthal integral first, where Jo is the 
zero-order Bessel function with integral representation given by 


1 


Jo(x) = on 


2T . 
J ei? cos é do, (3.257) 
(0) 


From this it follows immediately that 


2 
7 = ee ee NE 
h(ro,r) = 2r | dp; pı P(p1) exp j 5 (Z Ea *) 


) . (3.258) 


This gives a general expression for the optical transformation, in- 
dependent of the location of the start and end planes relative to 
the focal plane of the lens. In the following sections, we apply this 
to several important special cases. 


Problems 
1. Show that for an ideal point object, a transformation (3.250) 


from zo to z followed by a second transformation from z to zə is 
equivalent to a single transformation from zo to 22. 
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2. An electron with kinetic energy 100 keV is normally incident on 
a circular object of radius 100 nm. Estimate the shortest distance 
Z for which the Fraunhofer approximation is a valid estimate of 
the downstream amplitude. 


3.3.3 Amplitude in the Gaussian image plane 


We assume an object plane at axial coordinate zo, a thin lens 
of focal length f at zz, and a Gaussian image plane at zz. This 
geometry is shown in Figure 3.11. The relationship between the 
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Figure 3.11: Image formation for a point object at ro. 
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object distance Z4, the image distance %2, and the lens focal length 
f is given for ideal imaging as 


Gee yay: Wa (3.259) 


where M is the lateral magnification. An outgoing spherical wave 
emanates from the object point at lateral position ro. In the limit 
of perfect imaging (paraxial approximation), an incoming spheri- 
cal wave converges on the conjugate image point rr = Mro. We 
assume a round aperture of radius a coplanar with the lens at 
zy. The pupil function P(r,) is unity for 0 < rı < a, and zero 
for rı > a. Inserting 3.259 directly into the kernel h in 3.258, we 
obtain 


a -M 
h(ro,rr) = 2r f dry T1 Jo (= r ol) i 
0 


3.260 
7 (3.260) 


This is recognizable as the Bessel transform of the pupil function. 
From this it follows immediately that 


-1 
h(ro, rr) = 2ra? (eel) Jı [Ail 


Zo Z2 
(3.261) 
where we have made use of the integral 
faa) zdz = xz Jı(x). (3.262) 


Substituting in 3.255, we obtain the amplitude in the Gaussian 
image plane z = 2; as 


k’a? A aes 
wire) = on, Ze fa ro uo(ro, zo) exp ( zZ, 
kalr; — Mro| a kalr; — Mro| 
eS Jı | ————_ ] , (8.263 
( Zo 1 Z a ) 


where we have ignored leading phase factors outside the integral, 
as such factors do not affect the intensity |uz(rr)|?. The complex 
amplitude uo represents an extended object. Every object point 


3.3. Diffraction 205 


ro can be considered to be the source of a spherical outgoing wave. 
The waves emanating from neighboring object points are assumed 
to radiate coherently with respect to one another. This can only 
happen if all object points radiate monochromatically with a con- 
stant phase relationship. It therefore represents an approximation. 


The waves from all object points propagate coherently through the 
optical system. The integral over ro represents a superposition of 
amplitudes over the entire object plane. The complex amplitude 
uo in the object plane is convolved with the function h to form 
the amplitude uç in the Gaussian image plane. This physical sig- 
nificance of this can be appreciated by considering the important 
special case of a point object on axis. In this case the amplitude 
in the object plane is given by 


uo(Lo) = ô(ro), (3.264) 


where the right-hand side is the Dirac delta function. From the 
property of the delta function, it follows immediately that 


tes karr a karr (3.265) 
UT] Z 1 Taye . 


where the ratio a/Z is the tangent of the semiangle of the cone 
of rays at the image plane zz. The square of this functional form, 
which represents the intensity, is known as an Airy disk. Physi- 
cally, this is precisely the diffraction pattern of the aperture. The 
kernel h is called the point spread function, since it represents the 
blurring of every image point relative to an ideal image. 


To this point we have assumed imaging without aberrations. We 
now inquire into the effect of spherical aberration. This is depicted 
in Figure 3.12, where the spherical aberration gives rise to an ad- 
ditional path length increment ds. The spherical aberration in the 
Gaussian image plane was found earlier to be 


érg = Cg a’, (3.266) 


206 Chapter 3. Wave optics 


a F ` l 


Figure 3.12: Path length shift for a thin lens with spherical aber- 
ration. 


where a is the semiangle made by the extreme ray with the optic 
axis. Applying the Pythagorean theorem, 


(r+ rs) +f? =(f+d+ds)’, (3.267) 


where we wish to solve for the path length increment ds. For small 
angles, this is approximated by 


4 
ds & Cs (3) (3.268) 


In this approximation, the resulting phase shift is — kds. 


We next investigate the behavior of the wave function in a plane 
which is slightly displaced from the Gaussian image plane by a de- 
focus distance ôf. Recalling the earlier expression for the change 
of path length d for a thin lens of focal length f, we replace f by 
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f+of. Retaining only terms through first order in ôf, this leads 
to a path length increment due to defocus given by 
rf) 

d: = 3.269 
where this in turn leads to a phase shift —kdy. Taking spherical 
aberration and defocus into account, the expression 3.253 for a 
thin lens is modified as a multiplicative phase factor given by 


—ikr? h ôf ece) | 
2f D ae i 
Substituting this phase factor into the kernel h(ro, rr), we obtain 


the modified expression for the case with spherical aberration and 
defocus present, 


Ly(r) = exp | (3.270) 


Meon) = arfan oar (F R] 


J (= oa Mel 
2 


(3.271) 
The resulting complex wave function in the Gaussian image plane 
is 


ur(rr) exp [ik (Zı + Z2) | 


= 
AVA 
2 


. fèro uo(ro) exp (= h(ro, rr), (3.272) 


22) 


recalling that A = 27/k. The leading phase factor can be ignored, 
since it does not appear in the intensity |uy|?. The phase factor 
under the integral approaches unity for kr3/(2Z1) < 2r. How- 
ever, one must exercise caution before making this approximation 
for an energetic charged particle, since the wave number k is often 
quite large. 
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3.3.4 Amplitude in the diffraction plane 


We now consider the special case where 


This is shown schematically in Figure 3.13. We see that a given 
ray slope in the object plane zo is mapped to a specific, single 
transverse position in the diffraction plane zp, regardless of trans- 
verse position in the object plane. The diffraction plane zp is the 
plane where a diffraction pattern of a periodic object comes into 
sharp focus. 
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Figure 3.13: Formation of a diffraction pattern. 


Evaluating the kernel h in (3.258) between the object plane z = zo 
and the diffraction plane z = zp, we find 


ikr kry 
h(ro,rp) = = 2r f dr, rı exp ( im) Jo (= Ff lro +o}. 
(3.274) 
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We have assumed no aperture, in which case P(r,) = 1 for all rı. 
It follows that 


h(ro, rp) = = ivf exp -2 (ro + rp) j 5 (3.275) 


where we have made use of the integral 


[ex (tax) Jo(Bx) x dx = ae exp (— ud _ , (a #0). 


2a 4a 
(3.276) 
The amplitude up(zp) in the diffraction plane is given by 


uplrp) = N exp j (2+ 3 ai Ja ro uolro) 


exp (= exp -2 of (ro + rp) : . (3.277) 


Expanding, 
(ro +rp) = ro +rp+2ro "rp. (3.278) 


Substituting into 3.277, it follows that the amplitude in the diffrac- 
tion plane is given by 


ikro ‘rp 


uplrp) = af -fe ro uo(ro) exp (- zo Te) | (3.279) 


ignoring the leading exponential phase factor, as this does not in- 
fluence the intensity |up|?. This is recognizable as a Fourier trans- 
form of the object, with the transform variable krp/f. For this 
reason, the diffraction plane zp is often referred to as the Fourier 
plane. 


Geometrically, each specific value of the ray slope in the object 
plane is mapped into a unique position in the Fourier plane. This 
enables one to directly obtain an intensity map of a diffraction 
pattern. We notice that rp/f is the ray slope at the object. Equiv- 
alently, this is the tangent of the diffraction angle. 


210 Chapter 3. Wave optics 


Constructive interference occurs when the path difference between 
neighboring rays is an integral number of wavelengths. This can 
be seen in the figure, where the surfaces of constant phase form 
concentric spheres centered on each object point. All object points 
are assumed to radiate coherently with respect to each other. 


3.3.5 Optical transformation for a general 
imaging system with coherent illumina- 
tion 


In the preceding analysis, we obtained the optical transformation 
for a simple system in two specific configurations, each employing 
of a single lens with focal length f. This was done for the formation 
of an image, and separately, formation of a diffraction pattern. We 
assumed perfectly coherent illumination, where a constant phase 
relationship exists between all points of the object, and the illu- 
mination is monochromatic. The basic optical elements are a drift 
length, a thin lens, and a pupil. In principle, these can be applied 
in any order, and to any degree of complexity, to build up an arbi- 
trary optical system. Given these mathematical tools, we are now 
in a position to consider a general imaging system, consisting of 
an arbitrary configuration of optical elements. In this section and 
the next, we closely follow the analysis of Goodman [36]. 


We require only that an image be formed. Here the object is rep- 
resented by a complex amplitude uo(ro), and the image is rep- 
resented by a complex amplitude u;(rr). The relevant qantity of 
physical interest is }u;(r,)|?, which is the intensity measured in 
the image plane z;. We seek an optical transformation which ex- 
presses uy in terms of uo. Such a transformation would contain 
all of the physically relevant information about the quality of the 
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image; i.e., how closely the image approximates an ideal replica 
of the object. In order to be useful, this procedure must account 
for aberrations, defocus, and diffraction with a finite pupil, all of 
which tend to degrade the image. 


We begin our analysis by considering a specific hypothetical opti- 
cal configuration, consisting of two ideal lenses of focal lengths fı 
and fə, respectively. This is shown schematically in Figure 3.17. 
By assumption, a physical aperture is located in the Fourier plane 
za of the first lens. Furthermore, the image plane is assumed to 
lie in the Fourier plane of the second lens. The back focal plane 
of the first lens thus coincides with the front focal plane of the 
second lens. By inspection, it is easy to see that a real image of 
the original object in the plane zo is formed in the plane zz. The 
magnification is given by 


M=—-2, (3.280) 


where the minus sign indicates that the image is inverted with 
respect to the object. From (3.250) the amplitude u(r) is ex- 
pressed in terms of the object wo(ro) in the Fraunhofer approxi- 
mation by 


ikro: ra 
aT fi 


Similarly, the amplitude u;(r,) is expressed in terms of u(r) by 


fe ro Uo(Lo) exp (- ; (3.281) 


ualra) = 


ikra- ry 
ae. fo 


where P(r,) is the pupil function. Substituting the first of these 
equations into the second, and interchanging the order of integra- 
tions, we obtain 


u(r) = fe ra ualra) P(r) exp (- , (3.282) 


ur(rr) 


wale ro Uo( ro) Weg ra P( (ra) 


exp |- ra - (rr — Mro) f (3.283) 
fe 
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where we have made use of the expression for the magnification M 
above. 


We now define a two-vector position rg in the image plane zy 
as 
re = Mro. (3.284) 


The position rg represents the position ro in the object plane zo 
transferred to the Gaussian image plane zz by ideal imaging in the 
limit of geometrical optics. Furthermore, we define an amplitude 
uc(rc) in the Gaussian image plane as 


nme = uolto) - Zuo (72) : (3.285) 


where the object function uo(ro) is assumed to be known. Thus 
uglra) represents the ideal image. By inspection, this preserves 
the normalization, namely, 


[@raluctte)? = f @ro|wo(ro) P, (3.286) 

where d'ra = M? dro. This allows us to write 
ie Fi @rgug(ta) H(t; — ra), (3.287) 
where we have defined a new kernel H from (3.283, 3.284, 3.285) 


by 


H(rr—re¢) = sep | tea Pees) exp |- ra -(rr— ra) . 
(3.288) 
We see from the form of (3.287) that H is a point spread func- 
tion, and from (3.288) that H is the Fourier transform of the pupil 
function P. 


It is informative to study this in the Fourier space of spatial fre- 
quencies. We define the two-dimensional Fourier transforms 
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a(K) = f @éuj(e)e** 
f Peualee™* 
H(K) = / dE H (E) eS, (3.289) 


af 
Q 
z 

|l 


where K is the two-vector transform variable, and the integration 
variable € is a two-vector position having mathematical signifi- 
cance, but no particular physical significance. The physical signif- 
icance of K can be understood by considering a sinusoidal object, 
with spatial period A; in the image plane. In this case, 


_ 20 
ate 
in one Cartesian axis. Thus K is 27 times the spatial frequency 


1/A;. Applying the convolution theorem (see Appendix A) to 
(3.287) it follows immediately that 


ii, (K) = tic(K) H(K). (3.291) 


(3.290) 


Thus, the spatial frequency spectrum of the ideal image is modu- 
lated by H to yield the spatial frequency spectrum of the actual 
image. For this reason, H is called the amplitude transfer function 
or ATF. 


To understand the physical significance of this, we substitute in 
the expression for the kernel H. This gives 


H(K)= Jas Ja ez pera P(ra) exp (-Fr4-€)] 
(3.292) 


Interchanging the order of integrations, this gives 


H(K) = f Pra P(ra) . EIR exp i. (K+) \ 
(3.293) 


We define a new integration variable 7 by the substitution 


2 
¿= bn, PE = (ż) n. (3.294) 
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Substituting, this yields 


H(K) = [Pra Plra) E fèn exp -in (ra + EK) | ; 
(3.295) 


remembering that k = 2r /A. We recognize the expression in curly 
brackets as a Dirac delta function in two dimensions, where 


f l 2 fo 
(rat 4 = np | Pre —in- Pats ; 
(3.296) 
By the property of the delta function, we immediately perform the 


integration over r4, yielding 
ae fr 
H(K)= P a. K). (3.297) 


Mathematically, the amplitude transfer function H is the scaled 
pupil function. This result is quite general, in that it applies to 
any aperture, which can be represented by a pupil function P. In 
the special case of a round aperture of radius a, we have P = 0 for 
K > ka/ fz. This value of K represents 27 times a cutoff spatial 
frequency, above which no information is transmitted. The am- 
plitude transfer function is plotted in Figure 3.14. Physically, the 
pupil cuts off all diffracted orders with spatial frequency larger 
than the cutoff frequency. The aperture thus acts as a low-pass fil- 
ter for spatial frequencies. The absence of high spatial frequencies 
in the image translates to blur. 


With this preparation, we are now in a position to address the 
response of an arbitrary optical system. To this end we state a key 
hypothesis, namely, every optical system, however complicated, 
can be represented for analytical purposes by an equivalent two- 
lens confocal system shown schematically in Figure 3.17. The con- 
focal system represents the optical transfer of object to image in 
the paraxial approximation, since both lenses of the confocal sys- 
tem are assumed to be ideal. This system also properly represents 
the effect of diffraction at the exit pupil. We assume the beam ki- 
netic energy to be constant in the equivalent confocal system, and 
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0 ka/f, 


Figure 3.14: Amplitude transfer function, round aperture, paraxial 
approximation. 


equal to the landing energy in the image plane of the real system. 


In order for the confocal system to properly represent the real 
system, we must account for aberrations and defocus. The geo- 
metrical aberrations of the real system depend on the coordinates 
(Zo, Yo) in the object plane, and the coordinates (x4, y4) in the 
aperture plane of the real system. For now, we assume the object 
coordinates to be fixed, and the aperture coordinates to be vari- 
able. It follows that the coordinates (xg, yaq) of the ideal image 
are fixed as well. We regard the coordinates (£7, yy) in the image 
plane to be variable. A cone of rays impinges on the image point, 
where each ray in the cone intersects a unique point (x4, ya) in the 
aperture plane. This is true both in the real system and the equiva- 
lent confocal system. Each ray has a unique amount of aberration, 
which is expressed as an incremental shift 6Vo; in optical path 
length of the real system. The primary aberration is represented 
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in the special case of axial symmetry by 
Vor = öf “mdz = J f ma dz. (3.298) 
zo zo 


An explicit expression for this was given earlier in (2.213). At 
this point we make a key assumption, namely, the shift dVo; is 
applied discontinuously and entirely in the aperture plane z4 for 
the equivalent confocal system. This is conceptually equivalent to 
inserting a phase plate in the aperture plane, where the phase shift 
varies with coordinates (x4, ya) in the equivalent confocal system. 
Mathematically, we multiply P(r,) in (3.282) by a phase factor. 
This amounts to making the substitution 

P(r,4) > P(ra) exp 


z Vorlt z (3.299) 


in the expression (3.288) for H(r; — rg). The phase shift locally 
distorts the wave front in the aperture plane of the confocal sys- 
tem. This, in turn deflects the classical ray by a small amount, 
since the canonical momentum vector is locally normal to the wave 
front. This results in a lateral displacement of the ray in the image 
plane, as depicted schematically by the broken lines in Figure 3.17. 


Next we inquire into the effect of defocus. We represent this as 
a small shift of df in the focal length f2. We thus make the re- 
placement 


to > fat of = fr h + z (3.300) 
2 


Retaining only terms to first order in ôf, this leads to the replace- 
ment 
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in (3.288). The complete expression for the kernel H is thus given 
in the presence of aberrations and defocus as 


H(rr — ra) rg | Pra Plea)exp [Fea œ= ro) 
exp |$ avora) + Hi ra: (rr— ro) ) 


(3.302) 


where the integral is performed over the aperture plane of the 
equivalent confocal system. The aberrations and defocus are con- 
tained in the final phase factor on the right side. The amplitude 
uzr(rr) is given by (3.287) with the point spread function H given 
by (3.302). This provides a quantitative assessment of image fi- 
delity for a general optical system with arbitrary configuration. It 
thus represents the main result of this section. 


3.3.6 Optical transformation for a general 
imaging system with incoherent illumi- 
nation 


In the preceding section, the illumination was assumed to be co- 
herent. Ideally, this means that the illumination of the object plane 
is perfectly monochromatic, corresponding to a single eigenstate 
of definite energy and momentum. It also means that all points in 
the object plane to radiate with a constant phase relationship to 
one another. According to the postulates of quantum mechanics, 
the amplitudes for alternative paths are added in the measurement 
plane, with the absolute square of the resultant amplitude giving 
the intensity. 


In this section, we consider the case of incoherent illumination. 
By definition, this implies that neighboring object points radi- 
ate independently, with relative phase completely uncorrelated. In 
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this case the resulting intensity at the image plane is calculated by 
adding intensities from alternative paths. We define the intensity 
in the object and image planes respectively as 

Io(to) = |uo(ro) |? 
Ir(rr) | wr(rz) |?. (3.303) 


The ideal intensity in the image plane is a perfect magnified replica 
of the object intensity in the limit of geometrical optics. We define 
this as 


a A eee (=) , (3.304) 


From (3.287), and the fact that Ir(rr) = |ur(rr)|? we obtain 


Ic(ra) = 


I(r) = | f @reualre) H(r;— ra) | 
| J Prouglro) Her ra) | , (38.305) 


where rg = Mro is the object point transferred to the Gaussian 
image plane by ideal imaging. At this point we make a key as- 
sumption, namely, that total incoherence implies that only points 
where rg = rg contribute to the result. Mathematically, this is 
equivalent to inserting a delta function d(r@ — ra) inside the in- 
tegral over rę. This leads to the intensity in the Gaussian image 
plane z; as 


I(r) = / Pre Ie(tg)| H(t; — re) |?. (3.306) 
We define a new function 
J(tr — ra) = | H (rr — ra) l’. (3.307) 
This leads to 


Ir(rr) = [ere Ig(ta) J(rr = ra). (3.308) 


Evidently, J(rr — ra) represents the intensity point spread func- 
tion for the special case of incoherent illumination. 
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It is useful to study this in the Fourier space of spatial frequencies. 
We define the Fourier transforms 


i(k) S Pena 


T= J PEJE eE, (3.309) 


where K again represents 27 times the spatial frequency. Applying 
the convolution theorem (see Appendix A) to (3.308), we obtain 


Îi (K) = Îe(K) J(K). (3.310) 


In words, the transform of the image intensity is the transform of 
the ideal geometric image intensity, modulated by the transform 
of the point spread function. We now form the ratio 


J(K) (3.311) 


called the optical transfer function or OTF. Physically, it rep- 
resents the normalized spatial frequency response of the optical 
system with respect to intensity. Its modulus | O(K) | is called the 
modulation transfer function or MTF. We can gain an appre- 
ciation of the physical significance by relating J(K) back to the 
amplitude transfer function (ATF) derived in the previous section. 
This was denoted H(K). From (3.307, 3.309), we write 


J(K) = / dr | H(r) |? e". (3.312) 


The amplitude transfer function H(r) can be expressed in terms 
of its inverse Fourier transform as 


1 x ! 
H(t) = — peKi K) Kr, 3.313 
©)= a (K)e (3.313) 
Substituting this into (3.312) and interchanging the order of inte- 
grations, we obtain 


The oy J PK' F(K") J PK" Ñ*(K") 
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Figure 3.15: Modulation transfer function, round aperture, parax- 
ial approximation. 


E J POE, (3.314) 
T 


We recognize the expression in square brackets as a Dirac delta 
function 6(K” — K’ + K), in which case 
> 1 
J(K) = 


f (2K! H(K’) H*(K' — K). (3.315) 


Mathematically, J(K) is proportional to the autocorrelation func- 
tion of H(K) in K-space. Substituting (3.297), we obtain 


I(K) = eae fewr|-2x'| P| -2’-x)], (3.316) 


recalling that M is the magnification, P is the pupil function, fo is 
the focal length of the final lens of the equivalent confocal system, 


3.3. Diffraction 221 


and k = 27/X is the wave number of the particle. We define a 
two-vector position r by 


_ fh af BY 
Diag K, K = (=) dr. (3.317) 
It follows that 
2 
J(K) = (5) [er P (r) P(r! — r). (3.318) 


Geometrically, the integral on the right is the common area of the 
pupil with itself displaced by r. 


An important special case is a round aperture, for which the pupil 
function is 


P(r) =1 (3.319) 


for 0 < r <a, and zero for r > a. It follows that 


J(K) = ( : ) 20 ow (=) aas (5) 
27 fo 2a 2a 2a 
(3.320) 
This depends only on the magnitude r = |r|, because of the radial 


symmetry of the pupil function P(r). Substituting r = fo K/k, we 
obtain 


: ka \? | a (fhK\ PK RK\? | 
J(K)=2 (5) ia (25) -# 1- (25) i 
(3.321) 


where this is a function of the magnitude K. The optical transfer 
function is 


2] affk\ fk foK\* 
=a ca reg eae Ue fe aca 322 
On) T D (=) 2ka 2ka eo) 
This is unity at K = 0 as required, and zero for 
2 
pe (3.323) 


ae 
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where Ke represents the cutoff value of K. No spatial frequencies 
above this value are transmitted by the optical system. We notice 
that a/f is the tangent of the semiangle subtended by the pupil 
at the Gaussian image plane. The modulation transfer function 
(MTF) is plotted in Figure 3.15. 


3.3.7 The wave front aberration function 


An image point in the paraxial approximation is formed by a spher- 
ical wave converging on an ideal point in the Gaussian image plane. 
With aberrations present, the image is blurred and displaced from 
its ideal position. The aberrated image is formed by a wave which 
is distorted from an ideal spherical wave. 


This is shown schematically in Figure 3.16. An ideal spherical 
wave front S; fills the angular acceptance cone of the aperture. 
A ray emanates from every point along the wave front in a direc- 
tion locally perpendicular to the wave front. These rays converge 
to an ideal point in the Gaussian image plane, as shown by the 
broken lines in the figure. The aberrated wave front Sa is locally 
displaced from the ideal wave front by a distance x, which we des- 
ignate the wave front aberration function. The rays emanting from 
the aberrated wave front converge to a region which is blurred and 
displaced from the ideal image point in general. These rays are de- 
picted by the solid lines in the figure. 


As discussed earlier, every optical system, however complicated, 
can be analyzed in terms of an equivalent system consisting of two 
ideal lenses. This is shown schematically in Figure 3.17. A point 
object is located in an object plane at O. The object plane co- 
incides with the front focal plane of the first lens Lı. A physical 
aperture is located at the back focal plane of the first lens Ly, 
which coincides with the front focal plane of the second lens Ly. 
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Figure 3.16: Wave front aberration. 


The aperture plane is labeled A in the figure. In the absence of 
aberrations and in the limit of geometrical optics, an ideal point 
image I is formed from a point object O. The magnification M is 
defined as the ratio of the respective image and object heights. 
It is easy to verify by similar triangles in the figure that this is 
identical with the ratio — f2/ fı of the respective lens focal lengths, 
where the minus sign accounts for the inversion of the image. 


Each ray emanating from the object O intersects the aperture 
plane A at a unique transverse position (x4, ya). Each point 
(x4, ya) in turn maps to a unique ray slope (x4, y) where the ray 
intersects the Gaussian image plane. The wave front aberration 
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Lı A i; 


Figure 3.17: Equivalent confocal system. 


function y can be regarded as a function of transverse coordinates 
(x4, ya) in the aperture plane A. 


The product ky is the phase shift associated with the aberration, 
where k is the wave number given by k = 27r /A. For a well-designed 
optical system y is a small fraction of the wavelength A. Equiva- 
lently, the phase shift due to aberrations is much less than 27. All 
information about the aberrations is contained in the wave front 
aberration function x. This is discussed in many books [11], [16], 
[67]. 


In the equivalent confocal system we consider the aberration to 
be entirely introduced in the aperture plane as an abrupt phase 
shift. This is conceptually equivalent to introducing a thin phase 
plate in the aperture plane, which shifts the phase by an amount 
ky. This is depicted in Figure 3.17. 


The wave front aberration y(#,4, ya) is a scalar function defined 
in a plane. Assuming a round aperture, the function y can in 
principle be expanded in a series of Zernike polynomials. Zernike 
polynomials are orthonormal functions defined on the unit disk 
0 < p <1, where (p, ¢) are polar coordinates in a plane. Here p is 
defined as the radial ray position in the aperture plane divided by 
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the aperture radius in the equivalent confocal system. Any arbi- 
trary function F'(p, ¢) which is defined on the unit disk 0 < p< 1 
and 0 < @ < 2r can in principle be expanded as a linear combi- 
nation of Zernike polynomials. The method is discussed below. 


The Zernike polynomials Z? (p, @) are defined [1] by 


Za (P) = Ry'(p) cos(m@) 
Z,""(p,@) = Ra(p) sin(mé), (3.324) 


where n and m are non-negative integers with n > m. The radial 
functions R? (p) are defined as 


(n—m)/2 k n—2k 
a (—1)*(n — k)! p 
Fe) = 2 k! [(n + m)/2 — k]! [(n — m)/2 — k]! oe 


for (n — m) even, and R? = 0 for (n — m) odd. It is easy to show 
that R?(1) = 1, and therefore, —1 < Z7"(p,¢) < 1. 


The following orthogonality relations can be shown: 
1 


[ REO RU) pap = A EDIE ae 


A cos(m@) cos(m'd) do 
ie sin(m@) sin(m'd)de = (-1)™"' T ôm mp (m #0) 
a cos(m@) sin(m'¢)d¢d@ = 0, (3.326) 


Em T Ôm], |m 


where 0;,; is the Kronecker delta, and € = 2 if m = 0, and €m = 1 
if m Æ 0. It follows that 


Em T 


SS On n! Om m's 3.327 
2n+2 i ( ) 


1 20 1 


where (n — m) and (n’ — m’) must both be even. 
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We define an arbitrary functon F'(p,¢) as the linear combination 
of Zernike polynomials as follows: 


rades S [ann Z(0,6) + bm Zz™(p,)], (3.328) 


where the coefficients amn and bmn are considered arbitrary to this 
point. This defines a Zernike transform. Using the above orthogo- 
nality relations it is possible to invert these equations as follows: 


49 fl Qn 
Em T 0 0 
afo 1 Qn 

bmn = 2 S dpp f” dé Elo, 6) %,"(0.0). (8.329) 
Em T 0 0 


These two equations define the inverse Zernike transform. 


The radial functions R? (p) are easily obtained by direct substitu- 
tion of the various integer values n and m. The first nine are 


R = 1 
Ri = p 
R} = 2,°-1 
R = 9 
R3 = 3p°—2p 
RZ = ø” 


R} = 6p -6p +1 
Ri = 4-3 
Ri = ø>. (3.330) 


Substituting, the first fifteen Zernike polynomials with correspond- 
ing optical aberrations are as follows: 


Ty a=] piston 
Z = psind y-tilt 
Zi = pcosd x-tilt 


Z” p° sin (2¢) astigmatism 
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2p -—1 

p° cos (2¢) 

p” sin (39) 
poi — 2) 
p(s P — 2) 

p? cos (34) 

p“ sin (4¢) 


sin @ 


cos @ 


p? (4p? — 3) sin (26) 


69° -6p +1 


p” (4p? — 3) cos (26) 


p* cos (46), 


defocus 
astigmatism 
trefoil 

coma 

coma 


trefoil 


spherical 


where we recall that 0< p<1and0< @< 2r. 
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(3.331) 


It is useful in some cases to express these in Cartesian coordi- 
nates, again confined to the unit disk. To do this we first expand 
the trigonometric functions according to the well-known relations 


as follows: 


sin (2¢) 
cos (2¢) 
sin (3¢) 
(39) 
sin (4) 
(44) 


COs 


= 2 sino coso 


= cos?’ ¢ġ — sin? d 


= sing (3 cos’ ¢ — sin? ¢) 

= cos¢(cos* ¢ġ — 3 sin? d) 

= 4 sing cos ¢(cos? ¢ — sin? ¢) 

= cos ¢—6 sin? ¢ cos? ¢ + sinf ¢. 


(3.332) 


Substituting the Cartesian coordinates x = p cos ¢ and y = p sing 
we immediately obtain 


| 
— 
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4, = 2 =y 

Z = y(32*-y’) 

Zz3* = y[3(2?+y")-2] 

Z; = «[8(a?+y*)-2] 

Z = «(a#°—3y’) 

Zo = Any (se? +y’) 

Za = 2xy[4(a*+y")-3] 

Z9 = 6(2?+y’)? -—6(2? +y) +1 

Zi = (2? —y’)[4(2? +y") -3] 

Zi = bry +y. (3.333) 


Our goal is to express an arbitrary phase shift kx(x4, ya) in terms 
of a corresponding set of Zernike coefficients (amn, Omn), where 
k = 2r/A is the wave number. To this end we define a set of 
coordinate transformations as follows: 


= poos$=xra/Ra 
= psind=ya/Ra 


y 
p = ar+y?, (3.334) 


where R4 is the radius of the aperture in the equivalent confocal 
system. Based on this, we define the formal functional substitu- 
tions 


G(z,y) = F(p,¢) 
kx(wa,ya) = G(z,y). (3.335) 


Finally, we define the correspondence between the physical system 
and the equivalent confocal system according to 


ty = ta/fe 
yr = yal fe 
M = —fo/fi, (3.336) 
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where we assume the final ray slopes (x4, y4) and the magnification 
M are identical for the physical system and the equivalent confocal 
system. 


This completes the correspondence between the aberrations of 
the physical system and the Zernike coefficients (amn, bmn). The 
Zernike coefficients uniquely specify the aberrations. Strictly 
speaking, these coefficients are a function of object position O. In 
a well-designed system, the coefficients do not vary greatly across 
the object plane. 


3.3.8 Relationship between diffraction and the 
Heisenberg uncertainty principle 


We have seen in the foregoing sections that diffraction and interfer- 
ence follow naturally from the Fresnel—Kirchhoff relation (3.244). 
In turn, this relation represents a stationary-state solution of the 
spatial part of Schrédinger’s equation (3.230) for a free particle 
wave function. It is of great interest to consider diffraction and 
interference from a closely related point of view, namely, Heisen- 
berg’s uncertainty principle. This is the subject of the present 
section. 


We begin with diffraction from two parallel slits. This is shown 
schematically in Figure 3.18. A plane wave is incident from the 
top of the figure, with propagation direction normally incident on 
a screen S. The screen has two infinitely long parallel slits, oriented 
out of the page. A diffracted ray I emanates from the left slit, and 
a diffracted ray II emanates from the right slit. The two rays are 
assumed to be parallel to one another, and propagate at an angle 
0 relative to the central axis. A thin lens L causes the two rays to 
converge at a viewing plane P. The axial spacing between S and L 
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pl? 0 


Ap 


T(x) 


| 
-2 -1 n=0 +1 +2 


Figure 3.18: Two-slit diffraction with destructive interference. 


is assumed to be equal to the axial spacing between L and P, and 
both are equal to the focal length of the lens. The plane P thus 
represents the diffraction or Fourier plane of the lens. In this con- 
figuration every viewing angle 0 is mapped to a unique transverse 
coordinate x in the plane P. 


Scanning the viewing angle 0 gives rise to an intensity distribution 
I(x) in the plane P. The functional form for I(x) follows directly 
from (3.279). It is left as an exercise to solve for I(x). The peaks 
represent constructive interference of the two waves, and the val- 
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leys represent destructive interference. It is evident from the figure 
that the two rays have a path length difference given by d sin 8, 
where d is the separation distance of the two slits. Constructive 
interference occurs when this path length difference is equal to an 
integral number of wavelengths A. Equivalently, 


d sinf =n À, n= Pen ae ie E3 (3.337) 


Destructive interference occurs when the path length difference is 
equal to a half-odd number of wavelengths. Equivalently, 


d sing = (n+ 4) à, n = 0, +1, +2, +3,.... (3.338) 


Next we consider the specific viewing angle 0 for which the inten- 
sity distribution /(x) has its first minimum. This is shown in the 
figure, where the path length difference between the two rays is 
A/2, and n = +1 in (3.338). We assume the particle has momen- 
tum p, which is related to its de Broglie wavelength À by 


== 339 
P=5, (3.339) 


where h is Planck’s constant. This momentum has a transverse 
component Ap, which satisfies 


A 
oP = sind. (3.340) 
p 


Since Ap represents the half-width of the first interference fringe 
with n = 0, we ascribe an uncertainty Ap to the transverse mo- 
mentum of the particle. 


Separately, one has no knowledge about which of the two slits 
the particle passed through. Following Feynman [30, Chapter 1, 
Volume 3], any attempt to measure which slit the particle passed 
through would perturb the wave function, thereby irreparably de- 
stroying the interference. Consequently, we ascribe an uncertainty 
Ax = d/2 to the transverse position of the particle. It is left as a 
brief exercise to show (3.338, 3.339, 3.340) that 


Az Ap = i (3.341) 
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This represents a statement of Heisenberg’s uncertainty principle. 


The assignments of Ax and Ap are somewhat arbitrary. For ex- 
ample, one might assign the full-widths, instead of the half-widths 
for Ax and Ap, in which case the right-hand side in (3.340) would 
be multiplied by a factor of four. The uncertainty relation (3.341) 
is therefore properly regarded as an approximation. 


The relation (3.339) applies to a photon, and equivalently to a 
massive particle. It follows that the uncertainty relation (3.341) 
applies to both a photon and a massive particle. Alternatively, the 
momentum can be expressed as 


p=hk (3.342) 


for a photon and massive particle, where k is the wave number 
given as k = 27/2. The uncertainty relation (3.341) becomes 


Az Ak © - (3.343) 


given the above assumptions. This can be regarded as a general 
property of waves, not just a photon or particle. It applies to many 
wavelike phenomena, including water waves and sound waves, as 
examples. The usual coherent superposition of waves with differ- 
ent values of k to form a wave packet, applies to these different 
types of waves as well. 


Chapter 4 


Particle scattering 


The interaction of fast charged particles with matter provides the 
basic physical mechanism underlying most applications of charged 
particle beam instruments. In this interaction, an incident particle 
strikes a target, transfering momentum and energy. The incident 
particle is completely characterized by its rest mass, charge, mo- 
mentum, and spin polarization. The target can consist of bulk 
material (solid, liquid, or vapor), a single atom, a molecule, or a 
second particle (composite or pointlike). The interaction can be 
governed by the strong, weak, Coulomb, or gravitational forces. 
The gravitational force is too weak to be important for charged 
particles on the laboratory scale of dimensions. However the clas- 
sical Kepler problem is formally identical to Coulomb scattering 
between two individual charged particles via the inverse square 
dependence of the instantaneous force on the separation. In all 
cases, the interaction can be used to probe the physical or chemi- 
cal properties of the target. 


In some cases where the rest mass of the incident particle is much 
smaller than the target particle, the incident particle transfers neg- 
ligible energy to the target. Such an event is known as elastic 
scattering. An example is the angular deflection of a fast electron 
by the screened Coulomb potential of an atomic nucleus. This 
provides the basic contrast mechanism of a transmission electron 
microscope. In elastic scattering, the phase relationship of the in- 


233 


234 Chapter 4. Particle scattering 


cident particle before and after the scattering event is preserved. 
This phase coherence enables the study of a crystalline target 
through electron diffraction. 


A separate example of nearly elastic scattering is the Coulomb 
scattering of an energetic alpha particle by a heavy nucleus. This 
is the mechanism originally used by Rutherford [59] to characterize 
an atomic nucleus as a compact, massive, multiply charged body. 
In elastic scattering, the magnitude of the momentum of the inci- 
dent particle is unchanged by the scattering, but the direction can 
be significantly changed. Consequently, significant momentum can 
be transferred. 


Alternatively, the incident particle can transfer significant energy 
to the target. Such an event is known as inelastic scattering. This 
is much more complex than elastic scattering, because a rich va- 
riety of secondary processes can result. For example, an incident 
fast electron can excite the electronic states of a target material, 
giving rise to an excited-state atom, secondary photon, free elec- 
tron, electron-hole pair (exciton), or plasmon. Performing electron 
energy loss spectroscopy on the scattered electron yields informa- 
tion on the chemical and physical nature of the target material. 


Separately, the resulting secondary electron current forms the ba- 
sic contrast mechanism in the scanning electron microscope, or 
a scanning helium ion microscope. Alternatively, bombarding a 
material with an ion beam causes secondary ions of the target 
material to be ejected. Performing secondary ion mass spectrom- 
etry gives direct information about the atomic composition of the 
target material. 


Separately still, an incident electron or ion beam can be used to 
chemically alter a target polymer film in a useful way. The pat- 
terned film is then used as a binary mask in the process known as 
lithography. A focused electron beam or a focused helium ion beam 
can form a very fine, sharp writing pencil. Lithographic structures 
down to a few nanometers in size have been produced by this 
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method. Ultimate lithographic resolution is limited by the interac- 
tion of the incident particle with the recording medium. Improving 
this resolution remains a topic of current research. 


Alternatively, a focused ion beam can selectively remove material 
from a bulk sample, enabling fabrication of useful structures. A 
focused electron beam can be used to simultaneously in the same 
(dual beam) instrument to produce an SEM image, thus enabling 
in situ observation and endpoint detection of this removal process. 


Yet another important example of inelastic scattering is the pro- 
duction of a host of particle species in a high energy particle accel- 
erator. This process takes place by the conversion of kinetic energy 
of the incident particle to mass of the products. This has been used 
in ongoing research to deduce the most fundamental properties of 
elementary particles and their interactions. 


An enormous literature exists describing the interaction of charged 
particles with matter. Rather than attempt a comprehensive sum- 
mary, we will confine our attention to two-particle scattering. This 
is the most basic of all scattering processes, and is derived from 
first principles of physics in a straightforward way. Two-particle 
scattering forms the basis of many of the interactions of charged 
particle beams with matter in practical instruments. 


The central problem in the following sections is to calculate the 
momentum and energy transfer resulting from two-particle scatter- 
ing. Closely related to this is the intensity as a function of scatter- 
ing angle measured relative to the direction of the incident particle, 
where this represents the relevant measurable quantity. Although 
the process is fundamentally quantum mechanical, a great deal of 
intuitive understanding can be gained by first studying the classi- 
cal description, and then proceeding to the quantum mechanical 
description. 
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4.1 Classical particle kinematics 


The generic two-particle scattering problem assumes a single parti- 
cle with vector kinetic momentum p, incident on a second particle 
which is initially at rest in the lab frame. The first particle transfers 
momentum and energy to the second particle, and both particles 
exit to a final state. The final state is governed by conservation of 
total momentum and energy, as well as the details of the interac- 
tion. The first particle is assumed to come from a distance which 
is large compared with the dimensions of the interaction volume 
of the two particles. Similarly, both particles exit to a large dis- 
tance in the final state. At large distances the potential energy of 
the interaction can be ignored. In this section we investigate the 
constraints imposed by momentum and energy conservation. This 
general topic is referred to as kinematics. 


We assume the incident particle has rest mass m4, vector kinetic 
momentum pı, and total energy Æ. We assume the stationary 
particle has rest mass mz, vector kinetic momentum pz, and total 
energy Ey. These quantities are related by 


E = etme 
E = petm, (4.1) 


consistent with special relativity. We assign the value pı = p for 
the incident particle, where p is assumed to be known a priori, 
along with the rest masses mı and mə. We assign the value pz = 0 
for the stationary target particle. We assume that neither parti- 
cle has internal degrees of freedom. By implication, we ignore the 
effects of spin in the following analysis. We further assume that 
each particle retains its original rest mass through the collision. 


Since the only force is that which acts between the two parti- 
cles, it follows that the center of mass moves at constant veloc- 
ity. The motion of the center of mass is therefore uninteresting. 
We seek a frame of reference such that the total momentum of 
the two-particle system is zero. We call this system the center-of- 
momentum or CM frame. The scattering is most simply analyzed 
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in the CM frame, since the superfluous motion of the center of 
mass does not appear. This is shown in Figure 4.1, where the top 
diagram represents the lab frame, and the middle diagram repre- 
sents the CM frame. We denote the CM frame by primed quan- 


© > p,=P p,=0 © 
m m 

1 2 
o—_—__—_——>. r'r ° p',=-p' <—_—_——__-® 
m CM m 


Figure 4.1: Reference frames for scattering. 


tities, and the lab frame by unprimed quantities. We assign the 
value v to the vector velocity of the CM measured in the lab frame. 


The incident momentum and energy are measured in the lab frame, 
the scattering probability is calculated in the CM frame, and the 
scattered intensity is measured as a function of scattering angle in 
the lab frame. Our procedure must therefore consist of transform- 
ing from the lab to the CM frame, then calculating the scattering 
probability as a function of scattering angle in the CM frame, then 
finally transforming back to the lab frame. We adopt the notation 


B = v/e 
1 1 


E y1- w/e y= 


For brevity of notation, we further make the following substitu- 
tions for the mass and momentum, respectively: 


mÊ > m 
pe > p. (4.2) 


This is equivalent to a system of units where the speed of light is 
c= 1. The reader can transform back to the original quantities at 
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any point in the calculation. 


Lorentz transformation from lab (unprimed) to CM (primed) 
frame gives 


pi = yp- bE) 
By = (FE, -— pı) 
Po = Y(p2— 8 Ep) 
Ey = (£2 — B pe) (4.3) 


for the initial state. This makes use of the fact that momentum 
and energy form a four-vector p, = (p,i£/c). Substituting the 
assumed values pı = p, po = 0, Ey = me, p} = p’, and ps = —p’, 
we obtain 


p = y(p—8F) 
E, = y(Fi— £p) 
-p = y(-B mə) 
Ey = ymo. (4.4) 


Solving the first and third equations for 6, we obtain 


a NN. E? -må (4.5) 
Ei + mə Ei +me l 
and E+ 
7 ae (4.6) 


ym? +m + 2Eım 
Substituting these into the second and fourth equations of (4.4) 
yields the energies of the two individual particles in the CM frame, 
given respectively by 


m? T Emo 


v m? + m3 + 2Eyme 
Pe PoE. (4.7) 
ym? ms 2E Me 


E = 
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The sum F” of the energies of the two particles in the CM frame 
is thus given by 


El = El + E} = ym? + m3 + 2Ey my. (4.8) 
We thus obtain several identities which will prove useful later: 
EY E’ = m? or Emo 
ELE! = m+ Eim 


p' E' = mp, (4.9) 


where 
E? = (Ey +m)? — p? = m} + må + 2Eyma. (4.10) 


We have thus succeeded in calculating all relevant initial quanti- 
ties p’, Ej, and E% in the CM frame from known quantities p, mı, 
mg, E in the lab frame. 


We notice that in the CM frame, the total kinetic energy T” is 
given in terms of the total energy E’ by 


T = E' — (my, + m) = (m? +m + 2E, m2)" — (mı + mə). 
(4.11) 
In words, the kinetic energy is the total energy minus the energy 
of the rest masses. Separately, the total energy FE of the incident 
particle in the lab frame can be written as 


Ey, = yi Mı, (4.12) 


where 7 = 1/4/1 — 6? applies to the initial velocity 3, = v1/c of 
the incident particle measured in the lab frame. This velocity vı 
is not to be confused with the velocity v of the CM measured in 
the lab frame. Substituting, 


1/2 
De [(m + m2)? + 2 (41 — 1) mame — (mı + m2) 


Mı M2 


1/2 
zl — (mı + Mə). 


(mı + mz) [1+ 2¢n—1) mE 


(4.13) 
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In the low energy limit, 6; < 1. Keeping only the lowest order 
terms in the Taylor series, this reduces to 


T =} (Z) pp. (4.14) 


mı + Mə 


We now define a quantity called the reduced mass M, given by 


ya ee. (4.15) 


? 
Mı + M2 


from which we write 

TY = MB. (4.16) 
We conclude from this that the two-body scattering problem in 
the CM frame is mathematically equivalent to a single particle of 
mass M scattering about a fixed point, which coincides with the 
center of momentum. This reduction of the scattering problem is 
called the equivalent one-body problem. 


We are now in a position to consider the final state after scat- 
tering. This is shown in the CM frame in Figure 4.2, where the 
particle with rest mass mı has final momentum qj, and the par- 
ticle with rest mass my has final momentum q. We assume total 
energies €| and £4 for the two particles after scattering, where 


ef = gy +m 
= +m. (4.17) 


The scattering angle in the CM frame is defined as 6’. In later 
sections, we will address the central scattering problem, namely, 
calculation of the scattered intensity as a function of 6’. Conse- 
quently, we assume 6’ to be known for now from the calculation 
to come. 


Since measurement is always performed in the lab frame, we must 
express the relevant quantities there. This is shown in Figure 4.3, 
where the incident particle with mass m, is assumed to scatter 
through angle 01, and the incident particle with mass mg is as- 
sumed to scatter through angle #2. We wish to calculate the scat- 
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Figure 4.2: Initial and final states, CM frame. 


4 


Figure 4.3: Initial and final states, lab frame. 


tering angles 6; and 62 in the lab frame in terms of the scattering 
angle 6’ in the CM frame. This is accomplished in principle by 
Lorentz transformation from the CM to lab frame as follows: 


a = Ylli + 6) 
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e = Vet PG) 

Qz = (m+ Bed) 

ea = ¥(6o+ 8a); (4.18) 
where we have replaced 3 by — 8, and interchanged primed and un- 
primed quantities in the earlier Lorentz transformation. We have 
assumed that the relative velocity of the two reference frames is 
along the z-axis. Thus only the z-components of the momenta are 
altered by the Lorentz transformation, while the transverse com- 
ponents remain unaltered. 


We assume that, in the CM frame, the scattering can be described 
by 


hh = Pi» a =F; (4.19) 
for the particle with rest mass mı, and 
d2 = Po, E€, = Ey (4.20) 


for the particle with rest mass m2. These conditions express con- 
servation of the magnitude of momentum for each particle individ- 
ually. By implication, total energy is conserved for each particle 
individually. This is valid in the CM frame, but not in the lab 
frame. Substituting above, we obtain 


qz = y[p cos’ +8 E] 

ey = y[E,+8p' cosé’] 

qoz = y|p' cos(r — 0) + BES] 

E&2 = y| EL+8p cos(x — 0')]. (4.21) 


The transverse x-components of the final momentum are trans- 
formed as follows: 


/ f oe / 
Giz = dis =P sinf 
Qs = Gy, = —p' sing’. (4.22) 


From this we obtain 


fend qiz p' sind 
an = = 
: qız ‘Y(p' cos + 6 E}) 
. —p' sing! 
tanb: = ae oe (4.23) 


doz  Y(—p’ cos@ + 8 Es) 
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We now make use of 


BE, mî +r Fy ma = 
j m3 a Fy ms 5 
E! 24E 
ree (4.24) 


f m3 ba i Fy ms 


We have defined a new dimensionless quantity a in terms of quan- 
tities which are all known. Substituting, we express the final scat- 
tering angles 0, and 6) in the lab frame, in terms of the known 
CM scattering angle 6’ as 


(m? + m? + 2 FE, mo)! sind 

(Ei + M2) (cos 0 + a) 

(m? + m? + 2 FE, mo)! sind 
(E1 + m2) (cos 6’ — 1) 


tan 6, = 


tan 62 (4.25) 
These two equations express the scattering angles in the lab frame 
in terms of quantities which are all known. This represents the 
main result to this point. 


It is of great interest to investigate the momentum and energy 
transferred in the lab frame. These are found by subtracting the 
initial state values from the final state values. This is embodied in 
the equations 


Api = q@-P1 
Ap2 = d2— Pe 


AEF, = 6, — Fy 
AE, = Eq — Ev, (4.26) 


where the subscripts 1 and 2 refer to the incident and target par- 
ticles, respectively, and where Ap; is the transferred vector ki- 
netic momentum and AF is the transferred total energy. Since 
the scattering takes place in a single plane, the momentum p is a 
two-vector. In the following, we label the direction of the incident 
particle momentum as the z-axis, and the orthogonal axis as the 
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x-axis. It is straightforward, but quite tedious to calculate the mo- 
mentum and energy transfer. It is left as an exercise to the reader 
to set up the algebra, based on the above analysis. Indeed, an am- 
bitious reader could carry this through to a closed-form solution. 


The problem becomes greatly simpler in the nonrelativistic ap- 
proximation. This approximation is relevant to a large variety of 
charged particle instruments, which operate at low energy, where 
the kinetic energy is small relative to the particle rest-mass energy. 
This approximation also provides significant intuitive insight into 
the scattering process. 


We continue to use the same notation for the initial state in the lab 
frame, namely, we assume that particle 1 (the incident particle) 
has rest mass mı, vector kinetic momentum pı, and total energy 
E,. We assume that particle 2 (the target particle) has rest mass 
My, vector kinetic momentum py, and total energy Fo. 


We also continue to use the same notation for the final state after 
scattering, namely, we assume that particle 1 has rest mass mı, 
vector kinetic momentum qj, and total energy £1. We assume that 
particle 2 has rest mass ms, vector kinetic momentum qz, and to- 
tal energy £v. 


In the nonrelativistic limit these quantities are related by 


B= P 
2m 
2 
ys P2 
2Mo 
2 
o Q 
e Birs 2M1 
g 

— ei 4.27 

E2 T ( ) 


We assume the initial condition in the lab (unprimed) frame that 
pı = p and pz = 0, where p is oriented along the +z axis, and is 
known a priori. 
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With these assumptions, we now proceed to find qı and q2, making 
use of conservation of momentum and energy. As before, we trans- 
form from the lab to the CM frame, then calculate the scattering 
probability as a function of scattering angle in the CM frame, then 
finally transform back to the lab frame. By definition ph = —p/ in 
the (primed) CM frame, since by definition the total momentum is 
zero in this frame. Separately, the individual particle velocities in 
the CM frame are given in terms of the velocities in the lab frame 
by 


v = wv 
Uy = 2-4, (4.28) 
where v is the velocity of the CM, measured in the lab frame. 


From the above assumptions, it is left as an exercise to the reader 
to show that the relative velocity of the two frames is given by 


p 


TEE ey (4.29) 
My + mg 
Further, it follows that 
1 Mə 
Pi = ea Ss, 
mı + mə 
1 M2 
= —p—., 4.30 
P2 P Tea ( ) 


which satisfies the condition that p = —p} as required for the CM 
frame. 


Next we invoke the condition that the scalar kinetic momentum 
is preserved in the scattering for each particle individually in the 
CM frame, that is, 


d = r 
% = Po (4.31) 


This is consistent with the fact that the vector momenta q} and 
q after the collision are equal and opposite in the CM frame. 
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Taking account of these, we resolve the momenta qj and qj for 
the respective particles into transverse x-components, and longi- 
tudinal z-components: 


/ / : / 
Ne = pı sin@ 
$ 1 / 
qz = pı cos@ 
Von = p) sind’ 
Go, = —p cosb’, (4.32) 


where 0 < 6’ < r. Next, we transform to the lab frame. By velocity 
addition of the longitudinal components only, 


qis = Die 

diz = qi +Mw 

Gar = Ve 

Gaz = Q, + Mw. (4.33) 


Substituting, it follows that 


m 
qiz = -A sin @! 
Mı T Ms 
m m 
qz = ae ee (cos O+ m) 
Mı T Mə ms, 
m 
Qe = cee Ae sin 6! 
Mı T Mə 
m 
qəz = = A (cos = 1 ) i (4.34) 
m1 T Mə 


This represents the solution for the final momenta, where the right- 
hand sides consist of all known quantities. 


It is straightforward to calculate the momentum transferred to 
the two particles in the lab frame, Ap; = q; — p;. This is 


Apis = ——— sinb 
mı + mə 
m 
Api. = e i (cos 6’ — 1) 
Mı T Mo 
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Apa St Se = an 
Mm, + Mg 

Nps SS. EE gg = 1), (4.35) 
Mı T M2 


The energy transferred between the two particles in the lab frame, 
AE; = ¢; — E; follows immediately as 


2 
hee e (cos 6’ — 1) 
(mı + m2)” 
2 
ABS ae (cos 6’ — 1). (4.36) 
Lr 2 


The scattering angles 6; and 6 in the lab frame are easily found 
to obey 


ee sin 6! 
n 
eer (cos &’ + m,/mz ) 
sin 0’ 
= E E, 4.37 
ee (cos 6’ — 1) ee) 


This agrees with the earlier relativistic result in the limit where the 
kinetic energy is negligible compared with the rest mass. Mathe- 
matically, this is equivalent to E; ~ m;. The reader is encouraged 
to verify these results. 


We have thus succeeded in calculating the momentum and energy 
transfer in closed form, in the nonrelativistic limit. We have also 
calculated the scattering angles 0; and 6 in the lab frame, in the 
relativistic case, and the nonrelativistic approximation. This rep- 
resents the complete solution to the two-particle scattering kine- 
matics. This will prove very useful in the following sections. 


248 Chapter 4. Particle scattering 


4.2 Scattering cross section and classi- 
cal scattering 


Having obtained the reduction to the equivalent one-body prob- 
lem, we are now in a position to address our central problem, 
namely, calculation of the scattered intensity as a function of scat- 
tering angle. For simplicity of notation in the following, we make 
the substitution 

v=o (4.38) 
for the scattering angle in the CM frame. From the preceding 


analysis, this is equivalent to the scattering angle of the reduced 
mass M from the fixed scattering center in the CM frame. 


It is instructive to first study elastic scattering in the context of 
classical mechanics. The geometry is shown schematically in Fig- 
ure 4.4 for a repulsive scattering force. The particle of mass M 


i p 


0006 9 


Figure 4.4: Classical elastic scattering geometry. 


in the equivalent one-body problem is incident from the left, and 
initially travels along the +z direction. The fixed scattering cen- 
ter at O causes the particle to trace out a curved trajectory given 
by r(0). The particle passes to a very large distance, where the 
scattering force becomes negligible. The resulting scattering angle 


4.2. Scattering cross section and classical scattering 249 


is V, not to be confused with the instantaneous anglar coordinate 0. 


We assume a uniformly dense beam of many particles incident 
on the scattering center from the left. We expect that the number 
of scattered particles dN detected at angle V in a time interval dt 
must be proportional to the product of the incident intensity So 
times the scattered solid angle element dQ times the time interval 
dt, i.e., 

dN = o(V) So dQ dt, (4.39) 


where o(¥) is a proportionality factor which depends on the scat- 
tering angle J. This factor contains all relevant information about 
the details of the scattering process. It is called the differential 
cross section. Rearranging factors, this is 


1 dN 
a(v) = — —_. 4.40 
(0) So dQ dt ey 
The differential cross section is the number of scattered particles 
per unit solid angle, per unit time, per unit incident intensity. It 
has units of area. Mathematically, the central problem is to find 
the differential cross section a(V). 


The strength of the scattering and resulting ð vary inversely with 
the distance b, called the impact parameter. The problem is axi- 
ally symmetric about the z-axis. Considering the range of possible 
values of b, we therefore write 


_ dN 1 dN 
~ dAgdt 2mbdb dt’ 


where the cross-sectional area element dAp is an annulus centered 
on the z-axis. The final solid angle element dQ is given by 
dA 
dQ = — = 2r sinv dv, (4.42) 


r2 


So 


(4.41) 


where the area element dA is an annulus on a sphere of very large 
radius centered about the scattering center at O. Substituting 
(4.41, 4.42) into (4.40), the differential cross section is then 


b db 


av) = wo (4.43) 
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This applies to any case with axial symmetry, regardless of the 
detailed dependence of the scattering force on the separation r. 


We assume that all relevant information about the scattering 
forces is contained in the potential energy U(x) between the two 
particles, where x is the three-vector spatial separation between 
the particles. We assume U to be known. For the present analysis 
we now consider the special case for which the potential energy 
can be written as 


(4.44) 


where r = |x|. The potential energy is inversely proportional to 
the magnitude of the separation r between the two particles, and 
is spherically symmetric. This is the electrostatic potential energy 
arising from the Coulomb interaction between two charges qı and 
q2 separated by a distance r. With charges of opposite sign, k < 
0, giving rise to an attractive force. With charges of like sign, 
k > 0, giving rise to a repulsive force. This is just the classical 
Kepler problem. An equation for the trajectory is expressed in 
polar coordinates as the radius r as a function of the scattering 
angle 0. This was derived previously in the section on applications 
of Hamilton—Jacobi theory. It is 


2H L? 
1+ 4/14 M eos(0— 0n) |, (4.45) 


where H is the Hamiltonian, which represents the conserved total 
energy, and L is the conserved angular momentum about the scat- 
tering center at O. The trajectory is symmetric about a line going 
outward from the scattering center at angle 0), because the cosine 
is an even function. The square root is called the eccentricity of 
the orbit e. In the case where e > 1 the trajectory is a hyperbola, 
with asymptotes shown in Figure 4.4. In the case of an attractive 
force, this requires that the total energy H be sufficiently high 
that the particle is not bound. 
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Letting r — œ after the scattering has taken place, we obtain 


2H I? 
jeie eti (4.46) 
It is evident from the figure that 
2(009 — V) +0 =T, (4.47) 
from which it follows that 
0 
cos(V — o) = cos (5 — z) = sin(v/2). (4.48) 


The conserved angular momentum L is given at any given point 
along the trajectory by 


L=rxp. (4.49) 


Considering the incident particle far from the scattering center, 
we write 


: b 
L= lim rpo-— =b 2MH, (4.50) 
where pọ = V2MH is the initial momentum. Substituting this 
into (4.46), we find 
i 4H?b2 —1/2 
sin(ð/2) = — ( + 2 (4.51) 
Solving for the impact parameter b, this leads to 
K 
= 2). 4.52 
b oH cot(v/2) (4.52) 
Differentiating, we obtain 
db Ki $ 


Substituting (4.52, 4.53) into (4.43), we obtain the result for the 
differential cross section as 


k2 


~ 16 H? sint(9/2) 


o(v) (4.54) 
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This is the dependence seen by Rutherford in the scattering of 
alpha particles by a gold foil, from which the nuclear model of the 
atom was originally deduced. It is therefore called the Rutherford 
scattering cross section. It is the same for both signs of k, and is 
therefore independent of whether the scattering force is attractive 
or repulsive. It approaches infinity at zero scattering angle, and 
has a finite value for backscattering at ð = 7. Integrating over all 
solid angle, we form the total cross section. This is 


Dia = 20 [ow sind dv. (4.55) 


This is infinite for the case of Rutherford scattering. Physically, 
this means that the Coulomb potential effectively has infinite 
range. 


We are now in a good position to consider quantum mechanical 
elastic scattering. This is the subject of the next three sections. 


4.3 Integral expression of Schrodinger’s 
equation 


All relevant information about quantum mechanical scattering is 
contained in the differential cross section, which was defined in 
the preceding section. The cross section in turn depends on the 
wave function w(x,t), which is a solution of the time-dependent 
Schédinger equation (3.13) with appropriate boundary conditions. 
In this section we cast Schrodinger’s equation in a form which will 
prove to be directly applicable to the scattering problem. 


In the important special case where the electrostatic potential (x) 
has no explicit time dependence, the wave function w(x, t) takes 
on the separable form (3.18) where 


p(x, t) = u(x) ene (4.56) 
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and H is the eigenvalue for the conserved total energy. 


We now proceed to apply this formalism to the scattering problem. 
The spatial part u(x) satisfies 


V7u(x) + k?’ u(x) = 57 U(x) u(x), (4.57) 
where U(x) = q(x) is the potential energy associated with the 
scattering center, and 


(4.58) 


This is recognizable as the Helmholtz equation. It is inhomoge- 
neous, owing to the source term on the right-hand side. The ge- 


Figure 4.5: Geometry for scattering. 


ometry is shown schematically in Figure 4.5. The scattering center 
is located at point O. The scattering is described by the potential 
energy U(x,), assumed spherically symmetric about O. We seek 
the scattered wave function u(x) at the observation point P, lo- 
cated a large distance x from O. The incident plane wave and the 
scattered spherical wave are described by the wave vectors kp and 
k, respectively. The angle 0 between them is the scattering angle. 


254 Chapter 4. Particle scattering 


The difference between these outgoing and incoming wave vectors 
is q = k — ko. For elastic scattering the magnitudes kp and k are 
equal. 


We postulate a point source at P, radiating an outgoing spher- 
ical wave, represented by the complex wave function 


G(R) = = exp(ikR). (4.59) 


This point source does not exist physically, but provides a mathe- 
matical aid to solve the problem. The amplitude G(R) must satisfy 
the Helmholtz equation, 


V2G(R) + k°G(R) =0 (4.60) 


everywhere except at P, where G(R) has a singularity. To verify 
that this is the case, we express the Laplacian operator V? in 
spherical coordinates, leading to 


1 @ E 
= = 0. 4.61 
R IR [RG(R)| +k G(R) =0 (4.61) 
The solution is immediately recognizable as 
RG(R) = exp (+ ikR), (4.62) 


in agreement with (4.59) as required. We evaluate G(R) at the 
field point x; in Figure 4.5, where 
R=]|x-xıl. (4.63) 


For notational purposes, we denote G(R) = G(x, xı) in the fol- 
lowing, where 


G(x, xı) = exp (ik|x — xıl) (4.64) 


|x- x1 | 


represents the outgoing spherical wave. Multiplying (4.57) by 
G(x, x1), multiplying (4.60) by u(xı), and subtracting the two 
equations, we obtain 


G(x, x1) Viu(x,) — u(x,) V?G(x, x1) = a U(x1) G(x, xı) u(x). 
(4.65) 
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We now proceed to integrate this over the entire space, excepting 
the small sphere about P, where G is singular. We make use of 
the divergence theorem to convert the volume integral on the left 
side to a surface integral over the entire surface Sı surrounding 
the volume. This gives 


0 ð 
1 dS, ctx xı) an u(x1) — u(x) on G(x, x1) 


2m 3 
RIE ip d°x, U(X1) G(x, xı) u(x), (4.66) 


where n denotes the outward normal to the surface S4. The surface 
integral over Sı consists of the sum of two contributions, namely, 
the small sphere S; of radius € about P, and a large sphere S% at 
infinty. Taking the small sphere first, we find 


ð ð 
I dS; |a% X1) gn “D — u(x) an G(x, x) — 4r u(x) 


(4.67) 
in the limit € + 0, where the second term on the left predominates, 
and the first term becomes negligible. Considering the sphere Sv 
at infinity, we find 


A dS a X, X1) a u(x) — u(xı) Z Glam) 


> & - iku) R? G(R) dQ, (4.68) 
Soo \ On 


where dQ) is the element of solid angle. The right side vanishes, as 
long as 


(Fe -iru ) RaO (4.69) 
On 


at infinity. This is, in fact, the case, where (4.69) is known as the 
Sommerfeld radiation condition [85]. 


We are thus left with an equation for u(x) as follows: 


u(x) = 


oe i d’xı G(x; x1) U(x) ux), (4.70) 
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where G(x; x’) is the Green’s function (4.64). This equation rep- 
resents the main result of this section. It can be regarded as com- 
pletely equivalent to the differential equation for u(x) (4.57), which 
is the spatial part of Schrodinger’s equation for the stationary- 
state case. We recall that |u(x)|? is the probability density that a 
single, precise measurement of the scattered particle position will 
find the particle at position x. This represents the connection with 
experimental measurement. The equation (4.70) has the advan- 
tage of being more directly applicable to the scattering problem. 
We will make use of this in the following section. 


4.4 Green’s function solution for elas- 
tic scattering 


We now turn our attention to the important special case of two- 
particle scattering in which the incident particle transfers negligi- 
ble energy to the target particle. This process is known as elastic 
scattering. The central problem is to calculate the differential cross 
section a(V), which gives the scattered intensity at angle J. 


In the case where the incident and target particles each retain 
their same rest mass in the initial and final states, and we disre- 
gard internal degrees of freedom for each particle, the total energy 
is conserved for each particle individually in the CM frame. In 
the nonrelativistic limit, the two-body scattering is reduced to the 
equivalent one-body scattering. In this case a single particle with 
reduced mass M scatters from a fixed center, where M is given by 
(4.15) 


(4.71) 


where mı and mz are the rest masses of the incident and target 
particles, respectively. 
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In the case where mı < m2, the CM moves slowly in the lab 
frame. Also, M ~ my. In this limit the incident particle transfers 
a small fraction of its energy to the target particle. The scattering 
can therefore be regarded as elastic in the lab frame, as well as 
in the CM frame. This is a fair approximation for a fast electron 
incident on an atomic nucleus, for example. 


In the following analysis, we will calculate the differential cross- 
section o(W) for the equivalent one-body scattering in the CM 
frame. The geometry of the scattering is shown for the equivalent 
one-body problem in Figure 4.5. We assume a plane wave inci- 
dent from the left, with wave vector kp oriented along the positive 
z-axis. A scattering center is located at O, and an observation 
point at P, at position x. A spherical wave with wave vector k 
emanates from the scattering center O. The polar scattering angle 
between the incident and scattered wave vectors kg and k is v. 


We define the normalized incident wave function u(x) as the plane 


wave 
1 


Uo (x) “V 
We assume the observation point P is located far from the scatter- 
ing center. As such, the scattered wave u(x) can be approximated 
by a spherical wave, 


EKON, (4.72) 


1 etkr 


wr 


where we define r = |x|, and the factor f (ko, k) is called the scat- 
tering amplitude. 


u(x) = f (ko, k) 


(4.73) 


Strictly, the incident plane wave has infinite extent. However, in 
practice the incident beam is typically collimated, so that the inci- 
dent and scattered waves do not interfere at the observation point 
P. We write the incident flux Sp and the scattered flux S, respec- 
tively as 

So = |uo(x)|? vo, S = |u(x)|? v (4.74) 
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where vp and v are the incident and scattered velocities, respec- 
tively. For the case of elastic scattering, v = Ug. It follows imme- 
diately that the differential cross section is (4.73, 4.74) 


o = |f (Ko, k)]’. (4.75) 


We define a total wave function u(x) = uo(x) + u(x) as the sum 
of the incident and scattered wave functions. This must satisfy 
Schrodinger’s equation, 


2m 


(V? + k’) ur(x) = 7 U(x) ur(x) (4.76) 


where U(x) is the potential energy associated with the scattering 
center, and 


(4.77) 


where H is the continuous total energy eigenvalue associated with 
the state ur. This is recognizable as the Helmholtz equation. It is 
inhomogeneous, owing to the source term on the right-hand side. 


From the previous section, this equation can be expressed in inte- 
gral form as 


m 


I dx, G(x, x1) U(x) ur(x1) (4.78) 
where G(x, X1) is the Green’s function given by 


G(x, xı) = exp (ik| x — x, |). (4.79) 


1 
|x — x, | 
The scattering potential energy U(x,) is appreciably different from 
zero over a very small region xı. As the observation point P is 
very far away, we assume r >> rı, where we define r = |x| and 
rı = |x,|. From the law of cosines, 


x- x| = reer? —2x~x, 


r? (1 -255), (4.80) 


r2 


2 
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Taking the square root of both sides, and retaining only the two 
largest terms in the Taylor series expansion, 


X: Xi 


(4.81) 


Ix —x,| 2 r— 


It follows that (4.78, 4.79, 4.81) 


u(x) ~ 


ie T f dx exp (— —ik Ž Z) U (xı) ur(x1). (4.82) 


From the definition of the scattering amplitude f, it follows (4.73) 
that 


T= ae Y fax e**1 U(x,) ur (x1) (4.83) 


where we have defined the scattered wave vector as 
k=k-~, (4.84) 
T 


noticing that x/r is the unit vector in the direction of the scatter- 
ing. This equation cannot be solved in closed form, because of the 
presence of the still unknown wr under the integral. Therefore, we 
must seek a suitable approximation. To this end we replace ur un- 
der the integral by the incident wave function up. This is known as 
the first Born approximation. It is justifiable when the scattering 
is relatively weak. In this approximation, we write (4.72, 4.83) 


m —iq:x 
f(q) = sar | &x: Ue er eh (4.85) 

where we have defined the difference vector q as 
q=k— ko. (4.86) 


We recognize this as the Fourier transform of the scattering po- 
tential energy distribution U(x). As Ako and hk represent the 
incident and scattered momenta, respectively, it follows that hq 
is the momentum transferred in the collision. This expression for 
f(q) is quite general, as we have not yet specified the precise form 
of the scattering potential energy U(x,). It applies in many cases 
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of elastic scattering, including high energy particle physics, and 
electron microscopy, to name just two. 


We now turn to the important special case where U(x) is spheri- 
cally symmetric; i.e., U(x) = U(r). In spherical coordinates, 


dx = ri sin 6; dry do, dd. (4.87) 


We assume for the present analysis that the scattering is az- 
imuthally symmetric, in which case we immediately integrate over 
@, to give (4.85, 4.87) 


ny = T x —iqrı cos 01 
f= z dry run) f dO, sinO, ees, (4.88) 


Substituting cos 6; = u, we find 


T 1 
| dO, sin@, e$ cose = I. du e 'ar1H 
0 a 


= E (4.89) 
qrı 
and (4.88, 4.89) 
2 po : 
Fla) = ja fy drar U(r) sinlar), (4.90) 


This applies to any elastic scattering process for which the scat- 
tering potential energy is spherically symmetric. 


We now study the special case where an incident particle of charge 
ze is elastically scattered by the screened Coulomb potential of a 
target atomic nucleus of charge Ze. This process represents the 
basic mechanism of contrast formation in a transmission electron 
microscope, for example. We now assume that the spherically sym- 
metric potential U(r,) is represented by the screened Coulomb 
potential 


exp (—arj). (4.91) 


Physically, this means that the bare charge of the scattering nu- 
cleus is screened by the electron charges of the target atom. This 
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limits the spatial extent of the scattering region, compared with a 
bare, unscreened nuclear charge. Substituting, we find (4.90, 4.91) 


mZ ze? 1 


f(q) = neh? ah dry 27k sin (qr1) (4.92) 


Making the substitutions 


Q 
E = q'i, p =Z —, (4.93) 
q 
we obtain (4.92) 
mZze? 1 f% 
= on dé e7"§ si 
fla) ITE? q | reas 
mZ ze? 1 
= ; 4.94 
Qregh? q + a? ee 
where we have made use of 
o0 1 
dE e~™ sin £ = ; 4.95 
i fe "* sin € Tt RP (4.95) 
For an incident electron with z = 1, this takes the form 
4nZ 
= 4.96 
fla) 137 Ac ( + a?)’ eo) 
where we have made use of 
e? 1 
eren 4.97 
4reoħe 1387’ eae) 


and the Compton wavelength Ac for the electron, defined by 
h 
Ac = — = 0.002435 nm. (4.98) 
mc 


The differential cross section o is then given (4.75, 4.94) by 


2 
mz ze? 


1 
. 4.99 
mea (q? + a?)? ( ) 


erof ( 
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Figure 4.6: Momentum transfer and scattering angle. 


The quantity q is related to scattering angle J by 
q = 2ksin (9/2), (4.100) 


as is evident from Figure 4.6. Substituting, we find (4.99, 4.100) 


mz ze? $ 1 
ol) = ; 4.101 
(3 ta [4K sin?(0/2) + a?] í l 


In the limit a — 0, we obtain (4.101) 


Zze? 5 
16reo H sin? (8/2)| ’ 


o(0) = | (4.102) 
where H is the total energy eigenvalue of the incident particle. This 
is equal to the kinetic energy of the particle outside of the scatter- 
ing field. This is identical with the result for Rutherford scattering 
(4.44, 4.54) by an unscreened nucleus, derived previously from clas- 
sical mechanics. This remarkable equivalence between the classical 
and quantum mechanical results only holds true for the scattering 
potential energy inversely proportional to the separation r. 


We now proceed to integrate the differential cross section o over 
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all solid angles dQ, to obtain the total cross section Ce for elastic 
scattering as follows: 


Ce = | o(q)dQ, (4.103) 
AT 
where dQ = 27 sin V dv. We notice that (4.100) 
dQ dQ dd . 1 27q 
dq dð da 7 O. kcos(0/2). ke’ ne) 
where 0 < q < 2k. It follows that (4.99, 4.103, 4.104) 
Qn pok 
Je = al o(q) qdq 
mZze\? 4T 
; 4.105 
(22) a? (4k? + a?) ( ) 


In the limit a — 0, where the nuclear charge is unscreened, the 
total cross section becomes infinite. The Coulomb force is therefore 
said to have infinite range. Physically, a represents the reciprocal 
of the radius of the atomic electron cloud. It is approximately 


Zi/3 
ao 


a (4.106) 


where do is the Bohr radius of the hydrogen atom given by 


E Anegh? _ 187Xc 


> = 0.0531 nm. (4.107) 
em 


ao 


For incident energy H > 1 KeV, we have 4k? >> a’, in which case 


(4.77, 4.105) 
mz ze? 5 T 
Oe & 
Qnegh? } ka? 
Irm (mZze2\"* 
a, (22) (4.108) 
Taking 


H = im’, B= v/c, z=1; (4.109) 
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we obtain a useful approximation for the total elastic cross section 
for an incident electron (4.108, 4.109) as 
ZA/3 A2 ZA/3 
Te = E = 1.9 x 107ê nm?. 4.110 
B n x (4.110) 
Knowing the total cross section, we can now estimate the mean 


free path ue which an electron travels between scattering events. 
This is 


1 A 
= No. Nopoe’ 
where N = number of scattering centers per unit volume, A = 
atomic number, No = Avagadro’s number, and p = mass den- 
sity. For 100 KeV electrons incident on silicon (Z = 14, A = 28 
em/mole, p = 2.4 gm/cm®), we find ue = 93 nm for the mean free 
path. 


is (4.111) 


For fast electrons incident on a film of thickness of the order of 
the mean free path, the average scattering angle is quite small. In 
this case, we can approximate 


sin — & 


5 (4.112) 


The differential cross section o(vV) is then approximately given 
(4.101, 4.112) by 


Zze2 \* 1 
= 4.113 
d (= z) (0? + vy)? l ) 
where we have defined (4.106) 
a ZPA 
on == = 4.114 
Wk 2Tao ( ) 


where \ = h/p is the particle wavelength. The angle Vw is called 
the Wentzel screening angle. For 100 KeV electrons incident on 
silicon (Z = 14, A = 0.0037 nm), we find w = 0.027 rad, consis- 
tent with our assumption of small angles. The total cross section 
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is (4.101, 4.103, 4.112) 
Zz \? po 20d 
n5 dQ = f SRU 
É i 4 (2 z) ede (0? + 02)? 


Zz \? x 


For 100 KeV electrons incident on silicon, this yields oe = 1.8 x 
1074 nm?. We can form an angular distribution which is normal- 
ized to unity as o(V)/a-. This is 

eR 1 


cild) = CERT: (4.116) 


where 


anf 01 (9) 09 = 1. (4.117) 
0 


The normalized distribution gı will prove useful in the theory of 
small angle plural scattering. 


4.5 Perturbation theory 


At this point we describe what happens when a quantum me- 
chanical system experiences a small perturbation from its initial 
undisturbed state. This will provide a very useful mathematical 
tool to further understand scattering. 


We consider a general system, which is described by a Hamiltonian 
operator Ho satisfying 


Hig (x, 1) = th yat), (4.118) 


where (x,t) is the eigenfunction, and the Hamiltonian Hp is as- 
sumed to have no explicit time dependence. The eigenfunction for 
the jth state is given by 


y(x, t) = u(x) e, (4.119) 
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A linear superposition of eigenfunctions w;(x,t) yields the state 
function . 
Wo(x,t) = X aj u(x) Ua", (4.120) 
J 


where a; = const, and |a;|? is the probability that a single, precise 
measurement of the total energy will yield the eigenvalue H;. 


This description is quite general, and applies to a variety of quan- 
tum mechanical systems. As examples the system might consist 
of 


e a free particle, 


a particle in a general electromagnetic potential, 


a particle in the presence of the screened Coulomb potential 
of a target nucleus, 


e a particle in the presence of an atom consisting of a nucleus 
and a cloud of electrons. 


We now introduce a perturbation, by assuming a Hamiltonian H 
consisting of two terms, 


A = Hy) + f(t). (4.121) 


The first term Ĥo is the unperturbed Hamiltonian in the absence 
of any interaction between the constituent parts of the system. 
The second term Hy is a perturbation representing the interac- 
tion. In general this perturbation depends on the time t. 


Since the unperturbed eigenfunctions w(x, t) form an orthonormal 
set, it is always possible to expand the perturbed state function 
W(x, t) as a linear combination of the unperturbed eigenfunctions. 
Thus 
U(x, t) = So aj(t) u(x) ee: (4.122) 
j 


where the coefficients a,;(t) are now considered to depend on the 
time t, owing to the time dependence of the perturbation H;(t). 
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Substituting this into 


(Hy + H,) U(x,t) = ine U(x, t) (4.123) 


and subtracting out the unperturbed terms, we find 
7 l da; l 
Y a (Bu) e = iY a u(x) eT PA, (4.124) 
J J 


Multiplying from the left by w;(x) and integrating over the volume, 


ih £ a;(t) = 2 ae RR / dx ti;(x) [Hy u,(x)], (4.125) 
where we have made use of the orthonormality of the uj, namely 

[ee Ui (x) u(x) = dj. (4.126) 
For brevity we make use of the Dirac notation as follows: 


[@valx) [Ê u;(x)] = (il Aly). (4.127) 


We further abbreviate 


H; — H; = Hj. (4.128) 

In this notation we have 
th ga) =y ae (illj). (4.129) 

j 


This equation describes the time evolution of the amplitude a;(t) 
in the presence of the perturbation. It is exact, since no approxi- 
mation has been made to this point. 


We now introduce several approximating assumptions as follows: 


e Initially, only one state is populated, and all other states 
are unpopulated. Mathematically, a;(0) = 1 for one specific 
value of the index j, and a;(0) = 0 for all other values k Æ j. 
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e The interaction begins instantaneously at time t = 0, and 
remains constant thereafter. In this respect we regard H as 
time independent after t = 0. 


e The change in each a;(t) over time is small throughout the 
time of interaction. This is equivalent to the perturbation 
being weak over the time scale of interest. Mathematically, 
a;(t) ~ a,;(0) for all j and all t. 


Based on these assumptions, 


d ay 1 awa $ iHijt/h 
q vO ao (il 13) e ; (4.130) 


The index 7 runs over a multiplicity of final states with energy H; 
close to H;. Integrating over time, 


= (i li) f eHiut/ idt 


sce (ital) (ee —1) 
1 


gt fear H;;t 
ie (aila) exp (' Qh a sin ( Qh ). 
(4.131) 


Qi (t) 


The probability |a;(t)|? of finding the final state 7 at time t is then 


Hit 
hasta | = p=|z ( ill) sin ( a ) | l (4.132) 


The probability of transition from the initial state 7 to all final 
states is found by summing over the final states 2, 


P(t) = 2 |a;(t)|?. (4.133) 


In the important case of an unbound system where the final states 
i approach a continuum, this becomes 


g= dH; p(H;) \a;(t)|?, (4.134) 
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where p(H;) is the density of states with respect to energy. Assum- 
ing all final energies H; are close to the initial energy H; (weak 
perturbation), we can approximate 


Additionally, we approximate the matrix element by a single con- 


stant value ‘ 
(illj) = (H). (4.136) 


As a result, we can bring (H) and p(#) outside the integral. Sub- 
stituting, 


P(t) = 4 (H)? (H) i sin? ( i | (4.137) 
s Oh 


Making the substitution 


ZR 4.138 
gain (4.138) 
we find 
2T 2 
P(t) = pH) (H)*t, (4.139) 
where we have made use of the integral 
oo B sin? E = T. (4.140) 


The transition rate from a single initial state to all final states is 
then 

dP 2r 

dt À 
where the transition rate is the probability per unit time for the 
transition from a single initial state to all available final states. 
This result is quite general, in that it applies to many diverse 
phenomena. It is called the golden rule of perturbation theory. In 
the following sections we will proceed to apply it to the scattering 
problem. 


p(H) (HY, (4.141) 


270 Chapter 4. Particle scattering 


4.6 Perturbation solution for elastic 
scattering 


We now proceed to apply the equation (4.141) to the problem of 
elastic scattering. We assume that the initial state corresponds to 
an incident plane wave, where the free-particle eigenfunction uo(x) 
is given by i 
— tko-x 

Uo(x) WW ee. (4.142) 
where V is the volume, kg is the incident wave vector, and fiko is 
the incident momentum. The final state at a large distance from 
the scattering center is a plane wave given by 


1 i 
u(x) = Se (4.143) 


where k is the scattered wave vector, and Ak is the momentum af- 
ter scattering of the incident particle. In the initial and final states 
the particle is assumed to be far outside the region of scattering, 
hence the free-particle eigenfunctions. 


The unperturbed Hamiltonian Ĥo is then the free particle Hamil- 
tonian, and the perturbation Hamiltonian H, is the scattering po- 
tential energy 

A, = U(x). (4.144) 


We take the origin of coordinates x to coincide with the scattering 
center of the equivalent one-body problem. In the case of an elec- 
tron incident on an atom, the origin coincides with the position of 
the atomic nucleus. 


The matrix element (i| Aili) is then given by 
(iil) a dx U(x) e9, (4.145) 


where we have defined the difference vector q = k—kp. The quan- 
tity hq is the momentum transferred in the collision. 
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The density of states with respect to energy was found earlier 
(3.73) to be 


4nmV 
o(H) = Ta 2m? H, (4.146) 
where, for a free particle, 
21.2 
= us ; (4.147) 
m 
Substituting, this is equivalent to 
2 
= | (4.148) 


The transition rate from the initial state to all final states is (4.141, 
4.145, 4.148) 


= = | i Bar U(x) cia | (4.149) 


This represents the probability per unit time of scattering into all 
solid angles. For the differential cross section we seek the number 
of particles per unit time scattered into a particular solid angle 


element dQ. This is 
dN dP dQ 


dt dt 4n’ 


The definition of the differential cross section ø was given earlier 
by (4.39) 


(4.150) 


dN 

— =a Sod 4.151 

a TERRA ey 
where So is the incident flux (particles per unit time per unit 


transverse area) given by 
So R | uo(x) ? U6 aa (4.152) 
For elastic scattering, kg = k. That is, the magnitude of the mo- 


mentum and therefore the wave vector is unchanged by the scat- 
tering. Solving for the differential cross section g, we obtain 


[ox U(x) e*4* i = |f(q)|’, (4.153) 


m 
“GV | Ith? 
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where f(q) is the scattering amplitude derived previously (4.85). 
This is the main result of this section. 


As a reminder, this applies to an arbitrary scattering potential 
U(x). Mathematically, the equation (4.153) teaches that the scat- 
tering amplitude is the Fourier transform of the scattering poten- 
tial. In practical terms, the angular distribution of the scattering 
is directly measurable, and represents a sensitive probe into the 
detailed form of the scattering potential. 


We have succeeded in reproducing the earlier result by the in- 
dependent use of perturbation theory as an alternative approach. 
From this we deduce that the approximations made here coincide 
with the first Born approximation. 


4.7 Inelastic scattering of a particle by 
a target atom 


A scattering event which transfers energy from the incident parti- 
cle to the target material is called inelastic scattering. The trans- 
ferred energy can be manifest in a variety of secondary processes. 
These include emission of a photon, Auger electron, or ionization 
electron from a target atom. Alternatively they include collective 
excitation of the conduction band electron gas known as a plas- 
mon, or of the target lattice as a phonon. Analysis of the energy 
lost by the primary particle provides important information about 
the chemical and physical composition of the target. At very high 
incident energy, various elementary particles can be created. In- 
elastic scattering is quite complicated, and the subject of an enor- 
mous literature. 


In this study we confine our attention to the primary energy trans- 
fer, without considering the multiplicity of secondary processes. In 
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one example, a massive incident particle transfers energy to cause 
excitation or ionization of the electrons of a target atom. In a sec- 
ond example, the passage of a fast charged particle causes instan- 
taneous polarization of a dielectric medium. The central problem 
is to calculate the scattering cross section in each case. 


We assume an incident particle of mass m and charge ze, where z is 
an integer. The target is a single atom with atomic number Z. Ina 
single collision the incident particle transfers a small fraction of its 
energy to the target atom, initially in its ground state. As a result, 
the atom is excited to a higher energy state. In principle this can 
include ionization of the atom. The scattered particle then exits 
to a final free-particle state with reduced energy and momentum. 
The following analysis closely follows the classic paper of Bethe 
[5], which is based on the same perturbation-theoretical approach 
described above. The reader is referred to Egerton [24], who places 
the material in the context of the considerable body of subsequent 
work by others. 


According to the foregoing analysis, calculation of the differential 
scattering cross section ø is reduced to finding the appropriate 
matrix element (iĝ 1 j) between the initial and final states of the 
system. In this case, the system consists of a scattering particle 
and a single target atom. The origin of coordinates coincides with 
the nucleus of the target atom. The instantaneous position of the 
scattering particle we denote by x, and the instantaneous positions 
of the Z atomic electrons we denote by (xXi,...,Xz). 


The initial and final eigenfunctions, respectively, can be repre- 
sented by 


1 . 
u(x; X1,---Xz) = Fe Uer 
1 : 
Un(X}X1,...Xz) = = ež U(X... xz), (4.154) 


W 


where the subscripts 0 and n refer to the ground state and the nth 
excited states of the atom, respectively. The quantities U% and Un 
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represent the spatial wave function of the atom before and after 
the collision. The time dependence has been integrated out in the 
perturbation-theoretical approach described above. 


The scattering particle approaches from a large distance with in- 
cident wave vector kp, and exits with scattered wave vector k. In 
this inelastic scattering case, the incident and scattered wave vec- 
tors differ in both magnitude and direction. The eigenfunctions uo 
and un represent solutions to the Schrodinger equation, where one 
must be careful to include the dependence on all 3(Z + 1) spatial 
degrees of freedom, here labeled (x;x,,...,Xz). 


The matrix element is given by 
(illi) ie ala U ug dr i dzi, (4.155) 


where U(x;x,,...Xz) is the potential energy arising from the 
Coulomb interaction. This is 


ee nea -Ér : (4.156) 


4reo \|x| j lx- x;l 
The first term in large parentheses represents the interaction be- 
tween the scattering particle and the bare atomic nucleus, and the 
second term represents the sum of interactions between the scat- 


tering particle and the atomic electrons. 


The matrix element takes the form 
i|Ay j) UE Ae oe Les, , 4.157 
= j 


where q = k — kp. Following Bethe [5] we perform the integral 
over d°x first. This integral is of the form 


1 4 
O t eax by — — 97 eia, (4.158) 
Ix — xjl q’ 
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The truth of this equation can easily be established by applying 
the Laplacian operator Vi to both sides, where the subscript 
denotes differentiation with respect to the coordinates xj. Taking 
the Laplacian inside the integral on the left side, we make use of 
1 

2 


”i [x — xj] 


= —4r ô(x — x;), (4.159) 


which is well-known from electrostatic potential theory [48]. Using 
the property of the delta-function, both sides are equal to e~*4*!, 
thus establishing the identity. As a special case we have 

4 
ae (4.160) 


The matrix element is reduced to 


- <3 f Il z+ Sem). Vee ls; 


‘(4.161) 
where the integral is now only over the coordinates of the Z 
atomic electrons x;. Making use of the orthonormality of the set 


(i Ae 


U,,(x1,...,Xz) this further reduces to 
RE ezz 
(ilBli) = k Ono 
aay ae al (3: cies), Un Uo ` ie Tj. 


(4.162) 


At this point we define a dimensionless quantity €n(q) given by 


Z i = Z 
En(q) = —Z b+ ff (= oe) -Uno [J d?x;, (4.163) 
j=l j=l 


where £n is a property of the target atom in the nth excited state. 
The first term represents the elastic scattering and the remainder 
represents the inelastic scattering. The matrix element is then 


Pe ae e?z 
(H) = HE ape" (4.164) 
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The transition rate is given by 


dP 20 2 
— = — p(t 4.165 


where the density of states of the scattered particle is 


2 
Z mu (4.166) 


The number of particles per unit time scattered into a solid angle 
element dQ is given by 


dN dP dQ _ 
d dt 4x 


a(q) So dQ, (4.167) 


where So is the incident intensity given by 


hko 
So = —>. 4.168 
T (4.168) 
The central problem of this section is to calculate the differential 


cross section o(q). This is 


= 4.169 
Substituting, we obtain the result 


mez \* k 3 
D 4.170 
rep aa} ko | E (q) | ( ) 


TE ( 


where the subscript n indicates that the target atom is excited to 
the nth state. Energy conservation dictates that 
hk 5 Rk? 


En 9 
2m t 2m 


Eo + (4.171) 
where Eo and En are the ground state and nth excited state energy 
levels, respectively. The final state can consist of atomic excitation 
or ionization. 
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ko 


Figure 4.7: Wave vectors for inelastic scattering. 


We recall that kg, k, and q are related by 


This is shown in Figure 4.7, where 0 is the scattering angle. The 
various magnitudes are related by 


q? = k? +k? — 2 k ko cosd. (4.173) 
Taking the differential of both sides, we have 
qdq = k ko sin 0 dé. (4.174) 
The solid angle element dQ is given by 
dQ = 27 sin 0 dð. (4.175) 


Substituting, this leads to a differential form for the inelastic scat- 
tering cross section as 


m e?z 


2 
1 dq 
dQ = | — |] — len a 4.176 
on(q) ( E0 R2 ) On k2 |e (q) | q ( ) 


Integrating both sides over all possible values, we obtain the to- 
tal cross section. This form shows that the total cross section for 
inelastic scattering is inversely proportional to the energy of the 
incident particle. Elastic scattering has the same inverse depen- 
dence on incident energy. For electron scattering the ratio of the 
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total cross section g; for inelastic scattering divided by the total 
cross section ge for elastic scattering is given [24] by 


ao (4.177) 


Following Bethe [5], the momentum transfer q is related to the 
scattering angle 0 by 


Gg xP Ho, (4.178) 
where fp is defined as 
mAE 
ĝe = ——, 4.179 


and AE is the average energy loss per collision, and is in the range 
of a few eV to a few tens of eV, depending on the target material. 
Given that the scattering is from the electron cloud of the target 
atom, the predominant scattering angles for a fast incident particle 
are small, in the range of 1 mrad. 


4.8 Slowing of a charged particle in a 
dielectric medium 


When a fast charged particle passes through a solid, it interacts 
electromagnetically with many atoms simultaneously. In addition 
conduction band electrons are delocalized, with nonzero proba- 
bility density over a region many atomic diameters in size. The 
incident particle interacts collectively with the electrons of the 
target material. It is worthwhile to consider the interaction in a 
classical approximation. This section closely follows the analysis 
by Landau and Lifshitz [56]. This material is described in more 
detail, and placed in the context of the earlier work of others by 
Egerton [24]. 
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We consider a charged particle with velocity v passing through an 
infinite medium with complex dielectric coefficient €(w). Here w is 
the temporal angular frequency of the electromagnetic field of the 
particle. This presumes that the electromagnetic field is amenable 
to Fourier analysis. This is shown schematically in Figure 4.8. The 


Figure 4.8: Particle passing through a dielectric medium. 


particle has charge q and velocity v. An electromagnetic field is 
experienced at the observation point O due to the passing particle. 


We adopt the nonrelativistic approximation, in which magnetic ef- 


fects are negligible. The instantaneous electrostatic potential y(x) 
evaluated at any position x obeys Poisson’s equation, 


V’y(x) = ee). (4.180) 
The charge density p is due to the particle, and is given by 
p(x) = q(x — vt). (4.181) 


The potential y(x) can be expressed as a Fourier integral, 


y(x) = J dk @(k) e™*. (4.182) 
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Applying the Laplacian operator to both sides, we obtain 
V(x) = — | dk k? 6(k) e™*. (4.183) 


Separately, the delta function has the integral representation 


1 
Qr) 


d(x — vt) = [er ey, (4.184) 


Substituting into Poisson’s equation above, we obtain 


an (27)? Pak -v) P P 


The electric field E(x) is given by 


E(x) 


-V g(x) 
e / dk G(k) (iket**). (4.186) 


Separately, the electric field E(x) can be expressed as a Fourier 
integral, 


E(x) = J dk E(k) e**, (4.187) 
Substituting, we obtain 
E(k) = —ik(k) 
mE n ellkev)t (4.188) 


(27)? k? e(k- v) 


Performing the inverse Fourier transform, and evaluating the elec- 
tric field at the particle position x = vt, we obtain 


E(vt) = -a [or Bak = (4.189) 


where the exponential factors cancel. The force F on the particle 
is the product of the charge q times the electric field, 


22 
iq A k 
za EELS 4.1 
p aS Maka omen) 
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where k- v is identified as the angular temporal frequency w. The 
quantity 1/e is complex, with the real part even and the imaginary 
part odd. The real part integrates to zero, while only the imaginary 
part survives. We therefore write 


F = SO [er h S l (4.191) 


The direction of the force is opposite to the particle velocity indi- 
cating slowing of the particle. This is evident from the axial sym- 
metry of the problem. Assuming the particle moves in a straight 
line, the magnitude of the force represents the energy loss per 
unity path length. The integral can be evaluated in principle by 
resolving the wave vector k into axial and transverse components. 
In order to obtain convergence, one must subtract the vacuum 
contribution with no medium present. This is described in more 
detail by Landau and Lifshitz [56]. 


This represents the main result of this section. This approach has 
the advantage that the complex dielectric constant can be mea- 
sured by light-optical means. 


4.9 Small angle plural scattering of 
fast electrons 


It is often the case where the thickness of a scattering material 
film exceeds the mean free path for the incident particle. A very 
thick bulk target can stop or reflect the incident beam. In this 
case, the scattering is adequately described by the diffusion equa- 
tion. A commonly occurring case of considerable interest is where 
the scattering film is several mean free paths in thickness. This 
case is referred to as plural scattering. Diffusion has not yet set 
in, and it is necessary to describe the scattering in terms of a 
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classical transport equation. This is permissible for an amorphous 
target, as typically the phase coherence of elastic scattering has 
been lost due to the random distribution of scattering centers, and 
the presence of inelastic processes. This section is based on earlier 
published work by Snyder and Scott [81], Keil, Zeitler, and Zinn 
[50], Crewe and Groves [21], and Groves [38]. 


This is distinctly different from elastic electron scattering in a 
crystal, where phase coherence is maintained. Here constructive 
interference occurs at the Bragg angles, giving rise to the familiar 
diffraction patterns. The following discussion does not apply to 
the diffraction case. 


The mean free path is given by 


w= (4.192) 
where N is the number of atoms per unit volume, and ø is the 
total scattering cross section. For fast electrons in a typical solid, 
u for elastic scattering is proportional to the incident energy in the 
first Born approximation. Consequently, ranges from a few tens 
of nanometers at an incident energy of 10 KeV to a few hundreds 
of nanometers at 1 MeV. The sections observed in a transmission 
electron microscope must be thin relative to the mean free path, in 
order to avoid degradation of the image due to multiple scattering 
of the beam electrons. As this is not always possible, multiple scat- 
tering must be considered in the image formation. In this section 
we derive a method for understanding the scattering as a function 
of the sample thickness, measured in units of the mean free path. 


We define a dimensionless quantity n = z/, which we call the 
reduced thickness, where z is the thickness, measured in units 
of length. The probability P of an electron undergoing exactly j 
scattering events in the reduced thickness n is governed by Poisson 
statistics, namely l 

ni 


Plt) = il er (4.193) 
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This can be appreciated by calculating the expectation value of 
the number of scattering events 7 for a given reduced thickness n. 
It is 


j= D4 Pn) ie rome etl (4.194) 


The reduced thickness n is just the average number of scattering 
events. In this study we confine the discussion to small values of 
n, between zero and twenty, where diffusion has not yet set in. 


Fast electrons incident on a thin film or bulk material undergo 
elastic scattering by the screened Coulomb potential of a target 
nucleus, and inelastic scattering by the electrons of the target ma- 
terial. As the nucleus is much more massive than the incident 
electron, classical kinematics dictates that the energy transfer is 
negligible, hence the designation of elastic scattering. There is 
appreciable momentum transfer, however. This is related to the 
scattering angle J by (4.100). The angular distribution for elastic 
scattering is proportional to the differential cross section for small 
angles. The small angle approximation is justified for small values 
of n. We define a normalized angular distribution o(@) such that 


An love) 
| oV) dQ an f o(8) 0dd =1, (4.195) 
0 0 


where dQ is the element of solid angle. Equivalently, o (V) is the 
differential elastic scattering cross section divided by the total elas- 
tic cross section in the limit of small angles 0 < 1. We could use 
the screened Coulomb scattering result (4.116), 


viv 

o(0) = + OR (4.196) 
where Vw is the screening angle, and o(v) is normalized to unity 
with respect to solid angle. In the following analysis, we will not 
restrict the form of a(¥), however. In this sense the following can 
be regarded as completely general with respect to the detailed 
form of the single scattering, as long as the scattering angles are 
small. 


284 Chapter 4. Particle scattering 


We assume the elastic scattering is axially symmetric. This implies 
that spin polarization is unimportant, and the scattering medium 
is isotropic. We further assume that scattering angles associated 
with inelastic scattering are negligible on average, and can be ig- 
nored for the present purpose. 


We will find it useful in the following to regard the scattering 
angle as a two-dimensional vector r’ with components (x', y’), 
where x’ = dx/dz is the slope with respect to the transverse 
x—coordinate, and y’ = dy/dz is the slope with respect to the 
transverse y—coordinate. The magnitude of the scattering angle V 
is given for small angles by 


Oe |r| = yr? + y’. (4.197) 


Given the distribution o(r’) for single scattering, we now seek the 
distribution o2(r’) for exactly two scattering events. This is 


oalr) = pers o(|rol) allr — rol) = a(t’) * ofr’), (4.198) 


where * denotes the two-dimensional convolution with respect to 
slope components. Continuing this logic, the angular distribution 
for exactly 7 scattering events is 


ajr) Sot) ui eo’); (4.199) 
where the two-dimensional convolution is performed j times. 


With this preparation complete, we are now in a position to state 
the plural scattering problem in mathematical terms: given an an- 
gular distribution o(r’) for single scattering, normalized to unity, 
and a mean free path u, calculate the angular distribution F'(r’, z) 
for thickness z, in the presence of plural scattering. This is found 
by summing over all numbers of scattering events j as follows: 


F(r'; z) = 3 P;(z/p) a). (4.200) 
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We now propose to eliminate the unwieldy convolution g; by tak- 
ing the Fourier transform of both sides, and making use of the 
convolution theorem. The two-dimensional Fourier transform of 
a(r’) is defined as 


e(l) = j @r! o(r') exp ļi(l-r’)], (4.201) 


where | is the two-dimensional vector representing the transform 
variable conjugate to r’. Making use of the radial symmetry, 
a(r’) = a(r’), this becomes 


õ(l) = [r dr’ r' o(r’) Ee do exp (ilr’ cos ¢). (4.202) 


The ¢— integral can be written in terms of 


Joe) = = A dọ exp (ix cos @), (4.203) 


where Jo is the Bessel function of zero-order. This reduces to the 
well-known Bessel transform, 


HO =2n f * dr'e Io(tr") o(r’), (4.204) 


which is simply a two-dimensional Fourier transform of a radially 
symmetric function. Applying the same logic to F, we obtain 


F(;2) =2n J "AE Jo(lr’) F(r'; 2). (4.205) 
0 


Taking the two-dimensional Fourier transform of both sides with 
respect to slope components, and making use of the convolution 
theorem, we obtain 


F(l:z) = 5 P,(n) [a() = ey , (4.206) 


where | is the transform variable corresponding to the scattering 
angle r’, where r’ < 1. Performing the sum, we obtain 


P(t; z) = exp Seay) (4.207) 
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The solution for F'(r’; z) is found by performing the inverse Bessel 
transform, 


Feia) = f° alt ltr’) P(l.2). (4.208) 


This integral is typically performed numerically. In doing so, one 
must subtract the unscattered beam exp (—n) from F, as this leads 
to a delta function, which is poorly behaved. This represents the 
solution for the angular distribution in the presence of small angle 
plural scattering. 


It is instructive to derive F'(r’; z) by an alternative method, which 
will turn out to have more general applicability. The rate of change 
of F with path length s can be expressed as 


ÉRE- 5 Psa) + f ry Feyzo — rh). (4209) 
The first term on the right represents scattering out of the solid an- 
gle element dQ at r’, while the second term on the right represents 
scattering into the solid angle dQ at r’ from all other solid angles 
dQ = d'r) at r). The quantity 1/u represents the probability per 
unit length that a scattering event will take place, remembering 
that u is the mean free path. Using the chain rule for partial dif- 
ferentiation, we expand the derivative with respect to path length, 
obtaining 


d be a dx ð dy ð dz ð ae? 

a ee E ( ds Ox' ds ðy ds =| Bie): 
(4.210) 

We note that dx’/ds = dy'/ds = 0, as the trajectories are straight 

lines with constant slope between scattering events. Also, dz/ds © 

1 for small angles. This leads to 


— F(r';z) = == is) + SAGs *o(r’). (4.211) 
H H 


This amounts to a transport equation, which governs the evolu- 
tion of the distribution function F'(r’; z) as the beam propogates 
through a thickness z. 
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Taking the two-dimensional Fourier transform with respect to r’, 
and applying the convolution theorem, we obtain 


O x = i 
ae ASE (4.212) 


where we have made use of the radial symmetry of F and ø, as 
before. This is immediately integrated to reproduce the previous 
result. 


To this point we have only considered the distribution with respect 
to angle or slope. It is also of considerable interest to discuss the 
distribution with respect to transverse coordinates. This governs 
the lateral broadening of electron probes in thick films, as well as 
the resolution in transmission electron microscopes for thick speci- 
mens. In this case we must include the two-dimensional transverse 
position r and the two-dimensional slope vector r’. The geometry 
is shown in Figure 4.9. We define a distribution function F (r, r’; z) 
as the probability per unit area per unit solid angle for the parti- 
cle at depth z, in the presence of plural scattering. Applying the 
preceding logic, we expect F to satisfy 


Í Par iz) = -1 Flr, r’; z) +o ferme (r, ro; z) o(\r’ — rol). 

(4.213) 
To solve this equation for F, we begin by considering only one 
transverse coordinate x, and one transverse slope component z’. 
This is equivalent to a projection of the plural scattering problem 
onto the longitudinal xz-plane. The transport equation in this spe- 
cial case reduces to 


a Ce E EE) ; z) +i ferme (x, £9; Z) T(x — £9). 
ds u 

(4.214) 
The single scattering distribution T(x’) is a projection of the two- 
dimensional single scattering distribution o(r’). For the special 
case of screened Coulomb scattering, this is given by 
12 1 


N se 1 T rw 
=| Wee.) =F are 


(4.215) 
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Figure 4.9: Geometry for plural scattering at depth z. 


We will not assume this special dependency in the following anal- 
ysis, but rather a general form for small-angle scattering T(z’). 


Applying the chain rule, and expanding the total derivative as 

before, we obtain 

d ; dx ð dr ð dz ð je 

ie ~ (F ðr ds ðr! ds =| Hate): 
(4.216) 

Again dz'/ds = 0. For small angles, dz/ds = x’ and dz/ds = 1, 

leading to 


(e2 sla >| Rae z) = —* F(z, 2' z) + 7 Pleat z) x T(x’). 
u 
(4.217) 
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As before, we propose to eliminate the unwieldy convolution by 
taking the Fourier transform of both sides, and applying the convo- 
lution theorem. We define the one-dimensional Fourier transforms 
as 


T(1) oe dx! T(x’) exp (ilz’) 
F(k,l;z) = dee dz’ F(x, 2';z) exp[i(kx +Iz’)]. 
(4.218) 


Applying the operator 
a dx T dx’ exp [i(kx + Iz’) | (4.219) 


to both sides from the left, and interchanging the order of integra- 
tions, we obtain the reduced equation 


ð 0) - Ls : 
(-t aie 2) P(kliz) = -7 Ëi): [1-70]. (4220) 


In order to integrate this equation, we propose a transformation 
of variables, defining the new variables 


€=l+kz, n=l- kz. (4.221) 


Applying the chain rule for partial derivatives, we find 
ð  &ð | On O 


al Al dE ` Al On 
ð ð ð On A 


ðz Əz OF | Əz AN (4222) 
Substituting, we find 
o o o 
—k ae —2k By’ (4.223) 


and, consequently, 


a 1 : 
-2k 5 F62) = 7 F(R, 2) [1-20] (4.224) 
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We have succeeded in reducing the dimensionality to a single in- 
tegration variable 7. This technique in the theory of partial dif- 
ferential equations is known as the method of characteristics. We 
can now proceed to perform the integration as 


In F(k, l; z) =o z fal (4.225) 


where the subscript € signifies that € must be kept constant over 
the integration path. We treat the variable k as constant, as no 
derivative of k appears. We also make use of 


21=f+7, 2dl = dé + dn. (4.226) 


Since € = const, and hence dé = 0 for the integration, this gives 
In F(k,l;z) = af al dl (1 (4.227) 


This is immediately integrated to give 


~ 1 
F (k,l; z) = exp 7 [o(1) — gl + kz)] (4.228) 
where we have defined g by the indefinite integral 


a fa [1—#(0)]. (4.229) 


We note that 
g(l+kz) = 9(€) = const, (4.230) 


since € = const in the integration. The reader can verify by direct 
substitution that this is indeed the correct solution. It only remains 
to perform the inverse Fourier transform to obtain the solution, 
namely 


1 


F(x, ee = AE 


sf ak f dl F(1, k; z) exp [—i(kx + lx’) ]. 


(4.231) 
Given the single scattering law r(x’), projected onto the xz- 
plane, we thus obtain the projected plural scattering distribution 
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F(x,x';z) in principle. Typically, this last integral is performed 
numerically, after subtracting the unscattered beam exp(—z/) = 
exp(—n) from F. 


It is instructive to investigate several limiting cases. In the limit 
of zero thickness, z = 0, we find immediately that 


F(k, 1,0) =1. (4.232) 

Performing the inverse transform, this leads to 
F(a, 2';0) = 6(x) - 6(2’), (4.233) 

thus recovering the incident beam, as required. 

In the limit k — 0, a Taylor expansion gives us 
gl + kz) = J(+ g (Ù) kz, (4.234) 


to first order in k. In this limit, F reduces to 


F(0,1;z) = exp [z [1—7(1)] i (4.235) 


which represents the projected angular distribution. This is ex- 
pected, as k = 0 in Fourier space represents an integral over all x 
in direct space. 


In the limit l = 0, we obtain 


Fig E 5 one] i (4.236) 


Setting l = 0 in Fourier space represents an integral over all scat- 
tering angles in direct space. The distribution F'(x,0; z) in direct 
space is obtained from the inverse Fourier transform 


1 fe ~ 
F(x,0; z) = =f dk exp(—ika) F(k,0; z). (4.237) 


Physically, this represents the line spread function, corresponding 
to scanning an incident probe beam along the infinite y-axis, and 
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observing in the xz-plane. 


With these mathematical methods established, we are now in a 
position to solve for the full three-dimensional distribution func- 
tion F(r,r’;z) as a function of transverse coordinate r and slope 
r’ at depth z. The rate of change of F with path length in polar 
coordinates (r, œ) is given by the chain rule as 


dr 0 dọ ð dz ð 
ds ôr ds ð$ ds dz 


d aes B roof, 
d Fearg ) Forda) 


(4.238) 
where dr'/ds = 0, and dọ'/ds = 0, because the trajectories form 
straight lines between scattering events. Making use of the axial 
symmetry, F is independent of azimuth @, in which case the sec- 
ond term on the right vanishes. We note that F depends on the 
azimuthal slope component ¢’, as the scattering angle r’ has a 
skew component in general for two or more scattering events. For 
small angle scattering, dz/ds ~ 1, in which case we can substitute 


d o o 

FF PEP. o;z) = z Dp + =) PGR g; z). (4.239) 
Applying the logic of the preceding section, the transport equation 
is 


EEE PERF Ore) = ERG es) 
Or Oz u 


1 
+ g [EME giz) oll =r), 
(4.240) 


where o(|r’|) is the differential cross section for elastic scattering, 
normalized to unity. As before, the first term on the right repre- 
sents absorption into all scattering angles from the angle of in- 
terest, and the second term on the right represents emission from 
all scattering angles into the angle of interest. This equation is 
formally similar to the preceding case. Consequently, the preced- 
ing analysis can be adopted, being mindful of the various vector 
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components. The transformed equation is 


a 0\. iow - 
(ii Peha- Peti- 
(4.241) 


where (k,l, lẹ) are the Fourier transform variables corresponding 
to (r,r’, ¢’), respectively, and 


l= |l = fl? + 03. (4.242) 
Applying the method of characteristics, we define the variables 
E= l, + kz, n=l, — kz, (4.243) 


in which case, the transformed equation reduces to 


ae ea 
a eo F[1—a()] (4.244) 


Integrating this with respect to n, 


~ 1 
= — — ð ; 4.24 
He fati 5(1)| (4.245) 
Noting that 

Pp eH; 2dl, = d& + dn (4.246) 


with € = const, and hence dé = 0 for the integration. This gives 


EET [1 -ë(yB+E)]. (4.247) 


This is immediately integrated to give 


~ 1 
FUkyth2) = exp | E [alle ty) —alte+ kesl)] d, (4.248) 
where we have defined g(,, lẹ) by the indefinite integral 


(lst) = f al, [1-6 ( (2 +8)], (4.249) 
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and lọ is regarded as constant under the integral. 


Investigating the limiting cases, we see that, for z = 0, 
Fb, lig 0) = 1. (4.250) 
Performing the inverse transform, 
F(r,r’, o; 0) = 6(r) 6(r’), (4.251) 
thus recovering the incident beam, as required. 


In the limit k — 0, we can write the Taylor expansion for 
g(l, + kz, lg) to first order as 


o 
gll, + kz, ly) = ahs ly) +kz- ƏL gll, lọ), (4.252) 


in which case, 
F (k,l, 1g; z) = F (0, Ip, lg; z) = exp {-2 [1 — a(I)] | , (4.253) 


remembering that 1 = |I| = ,//? + 13. This is immediately recog- 
nizable as the angular distribution. This is expected, as k = 0 in 
Fourier space represents an integral over the entire range of radial 
coordinate, 0 < r < œ in direct space. This result is superfluous, 
as it was derived previously by simpler methods. 


Finally, setting 1 = 0, we find 


Reis p= ae E [(0,0) — a(kz,0)] , 


(4.254) 
where this represents the integral over all slopes |I| in direct space. 
The distribution in the transverse radial coordinate r is found by 
performing the inverse Bessel transform, 


F(r:z) = l AENEA F(k, 0, 0:2). (4.255) 
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This represents the radial point spread function at depth z, in the 
presence of plural scattering. This being the case, it follows that 
F(k, 0,0; z) represents the modulation transfer function, as this is 
the Fourier transform of the point spread function. This completes 
the general solution to the plural scattering problem. 


Chapter 5 


Electron emission from 
solids 


Every practical electron beam instrument relies on a stable, long- 
lived electron source. The most commonly used sources extract 
electrons from a bulk metal or semiconductor, and accelerate the 
particles across a vacuum gap using the electric field of an elec- 
trode at a positive potential relative to the source. The beam thus 
appears to originate from an apparent source, real or virtual, seen 
looking back from the electron optical system. 


For practical purposes this apparent source is characterized by 
its measurable macroscopic properties. These include beam en- 
ergy, current, lateral intensity distribution, angular intensity dis- 
tribution, and energy spread. These quantities are a function of 
the physical source properties, including geometry and material. 
They also depend on the temperature and applied electric field 
as controllable operating parameters. Because of brightness con- 
servation, the macroscopic source properties govern the optical 
properties of the entire optical system. 


Electron emission is a fundamentally quantum mechanical pro- 
cess. A great deal of insight can be gained by regarding the bulk 
emitter material in terms of a relatively simple model originally 
proposed by Sommerfeld [82]. In this model, the conduction elec- 
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trons are approximately free to diffuse throughout the bulk mate- 
rial. Electrons in the conduction band occupy energy states. The 
Pauli exclusion principle dictates that no more than one electron 
can occupy any given state. The average occupation number for a 
state obeys Fermi—Dirac statistics, and is between zero and one. 
In the limit where the absolute temperature T approaches zero, 
all states with energy £ in the range 0 < € < Ç are occupied by 
one electron, where ¢ is called the Fermi energy. In this limit all 
states with energy higher than the Fermi energy are unoccupied. 


In the following we assume that the emission surface is planar, 
and infinite in lateral extent. In this approximation the problem 
can be regarded as spatially one-dimensional, with the x-axis per- 
pendicular to the emission surface. Some fraction of the conduction 
electrons drift to the surface, where they can be emitted into the 
vacuum to form a beam. Once emitted, an electron experiences a 
Coulomb force which tends to attract it back toward the emission 
surface. This is called an image force, and is described in detail 
in the following section. It gives rise to a potential energy barrier 
which must be overcome in order for the electron to be emitted 
into the vacuum. 


In this one-dimensional model we consider the potential energy of 
an electron to be zero everywhere inside the bulk material. Elec- 
trons are free to drift throughout the bulk material, with a net 
flux incident on the emission surface from within the material. For 
electrons with a specific total energy W within the bulk material, 
we assume a current density J(W) incident on the emission sur- 
face from within. Here J(W) has dimensions of charge per unit 
transverse area per unit time per unit energy W. We further as- 
sume a single electron with energy W has a probability D(W) of 
overcoming the potential barrier, to be emitted into the vacuum. 
The total emission current density 7 is then given by 


= j- dW J(W) D(W), (5.1) 


where we have integrated over all possible values of the energy W. 
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In practice many sources are not planar. Furthermore, the energy 
bands of the solid emitter material tend to bend at the interface 
with the vacuum. This band-bending can be enhanced by prepar- 
ing the emitter surface with an additional surface layer. These 
factors affect the actual emission, and the resulting macroscopic 
source properties. We define the electron affinity as the energy 
needed in practice to remove a single electron from the emitter. 
Because of these factors, the following analysis must be regarded 
as approximate. As is often the case, one replaces an intractable 
problem by a related problem which can be solved, producing an 
approximate result. 


To summarize, the emission current density depends on the tem- 
perature and the applied electric field. In the limit of zero applied 
electric field F and elevated temperature T, the current is called 
thermionic emission. In the limit of zero temperature T and el- 
evated electric field F, the current is called field emission. The 
central problem of this chapter is to determine the emitted cur- 
rent density j as a general function of temperature T and applied 
electric field F. 


5.1 The image force 


An electron which has been emitted into the vacuum experiences 
an electrostatic force which attracts it back toward the emission 
surface. This can be understood by examining the Coulomb force 
on an electron with charge —e located in the vacuum a distance x 
from a planar conductive surface at ground potential. This situa- 
tion is completely equivalent to a configuration where the planar 
surface is replaced by a charge +e located behind the emission 
surface at a distance 2x from the electron. This fictitious charge 
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+e is called an image charge, and the attractive force is called 
the image force. This was first understood by Nordheim [66]. The 
magnitude of the force F on the electron is given by 


= 


F(x) (5.2) 


~ 167€9 x?’ 
where the minus sign indicates that the vector Coulomb force 
points in the negative x-direction, back toward the emission sur- 
face. 


We consider a virtual displacement of the electron from coordi- 
nate +x to +00 in the presence of the force F(x). This results in 
a change in potential energy U(x) given by 


—e? 


U(a) = f° FE (5.3) 


~ 167€9 x’ 
where this is the work needed to remove the electron from +2 to 
+oo. This properly accounts for the fact that the image charge un- 
dergoes a virtual displacement equal and opposite to the electron. 
An individual electron must have enough energy to surmount the 
potential energy barrier in order to be emitted into the vacuum. 
Alternatively the electron can tunnel through the barrier in the 
presence of an applied electric field. In either case, the expression 
(5.1) for j applies. 


5.2 The incident current density 


We now turn our attention to the current density J(W) of elec- 
trons with total energy W incident on the emission surface from 
within the bulk material in one spatial dimension. Within a metal 
the conduction electrons are approximately free. We can therefore 
choose the potential energy to be zero. We denote the total energy 
of a single electron inside the metal in three spatial dimensions 
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by £. In the following we deduce the one-dimensional properties 
from the three-dimensional properties, making use of the planar 
symmetry. 


The density of energy states for a nearly free electron within the 
solid is given in three dimensions (3.73) as 


dN  8nV 

de k 
where V is the volume, h is Planck’s constant, and m is the elec- 
tron mass. This is the number of available states per unit energy 
interval de. Each electron energy level has two spin states with the 
same total energy £. To properly account for this, we have mul- 
tiplied the right-hand side of (3.73) by two. The Pauli exclusion 
principle permits, at most, one electron occupying a given state. 


amie, (5.4) 


The expectation value of the occupation number of a state of en- 
ergy € is governed by Fermi-Dirac statistics, and is given by 


n(e) = exp (5) + i} , (5.5) 


where k is Boltzmann’s constant, and T is the absolute temper- 
ature. The energy ¢ is commonly referred to as the chemical po- 
tential per atom, and alternatively as the Fermi energy. It is easy 
to verify that 0 < n(e) < 1, consistent with the Pauli exclusion 
principle. It follows that the average charge density within the 
material is given as a function of total energy £ by 


ple) = $ (=) ne 


87e 
= gpg Ime nile), (5.6) 
where p(¢) de is the charge per unit volume of conduction electrons 
with total energy between £ and £ + de. Assuming the potential 
energy is zero everywhere, the total energy € is given in terms of 
the electron velocity v by 


E= dmv’. (5.7) 
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Since the energy £ depends only on the magnitude of the veloc- 
ity and not on the direction, it follows that the electron velocities 
are distributed isotropically with respect to propagation direction 
within the bulk material in this approximation. 


Choosing the x-axis to be perpendicular to the emission surface, 
the current density component j,(€) is given by 


P(E) Ve 


ple) E cos 0, (5.8) 


where vy is the x-component of the velocity and @ is the polar 
angle which the velocity vector makes with the x-axis. From (5.5, 
5.6, 5.8) we obtain the z-component of the current density as 


je(€) 


1 
noe oe apenas (5.9) 


for 0 < 6 < 7. Next we define the differential 


; ; dQ 
djx(€) = je(€) An? 


where dQ) is the solid angle element given by dQ = 27 siné dé. 
Substituting, 


(5.10) 


8 
dj,(€) = nm en(e) cos@ sin 0 dé. (5.11) 
At this point we define the total energy W in the z-direction as 
W = mv? = e cos? 0. (5.12) 
Substituting, 
, S7me 
dj,(€) = 73 W n(W sec? 0) tané dé. (5.13) 


We define the current density J(W) in the one-dimensional prob- 
lem according to 


dJ(W) dW = dj,(e) de, (5.14) 
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where this ensures that the total current is the same for the one- 
dimensional and three-dimensional problems. Substituting, 


8arme 
h3 
We define a new variable € by 


dJ(W) = W n(W sec? 0) sec? 0 tan 0 dd. (5.15) 


E = sec’ 0, dé = 2 sec? 0 tan 0 d0. (5.16) 


Substituting, making use of (5.5), and integrating over the range 
1 < £< œ, we find 


aw) = SRE fae few ($ m=- =) 4a] (5.17) 


Performing the integral is straightforward, and is left as an exercise 
for the reader. We obtain the result 


J(W) = ome In exp (S| + i ; (5.18) 


where J(W) has dimensions of current per unit transverse area 
per unit energy interval. This is the main result of this section. 
It is identical with the result obtained by Kemble [52], and used 
later by Murphy and Good [64]. 


Anticipating the case of cold field emission, it is useful to explore 
the limit T — 0. We obtain 


4nme 
h3 
where we note that 0 < W < Ç in this limit. 


J(W) x 


(¢- W), (5.19) 


Using the general expression (5.18) for the incident current density 
J(W) as a function of the total energy W in one spatial dimension, 
we next proceed to calculate the transmission probability D(W), 
and the resulting emission current density j for various combina- 
tions of the temperature T and the applied field F. This is the 
topic of the following sections. 
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Problems 
1. Perform the integral (5.17) to obtain the result (5.18). 


2. Verify the result (5.19) for the limit T — 0 by repeating the 
procedure of this section, making use of the fact that 


n(e) = exp (=) + i] = ; ee (5.20) 


3. Derive an analytical expression for the Fermi energy Ç of a 
metal based on the number of conduction band electrons per unit 
volume and the density of states with respect to total energy €. 


5.3 Thermionic emission 


In the case of zero applied electric field and elevated temperature, 
the energy needed for an electron to surmount the potential bar- 
rier is thermal. This is called thermionic emission. The central 
problem in this section is to calculate the emission current density 
j for thermionic emission. We make use of (5.1, 5.18). The task 
remains to calculate the probability D(W) that an electron with 
total energy W will be transmitted across the barrier. 


The potential energy U(x) associated with the image force is given 
by (5.3). This is plotted as a function of coordinate x in the direc- 
tion normal to the emission surface in Figure 5.1. The surface of 
the metal is at coordinate x = 0. The left region x < 0 represents 
the interior of the bulk emitting material, and the right region 
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U (x) 


Figure 5.1: Energy diagram for thermionic emission. 


x > 0 represents the vacuum. The solid curve is the potential en- 
ergy U(x). 


We consider a single electron with energy W inside the bulk emit- 
ter material with x < 0. We approximate the potential energy 
U(x) by a square barrier. The problem of scattering of matter 
waves by a square barrier in one dimension is treated in many 
books on elementary quantum mechanics, see for example Liboff 
[59]. 


Inside the bulk material with x < 0 the wave function is a superpo- 
sition of a right-propagating incident wave plus a left-propagating 
reflected wave. This is represented by 


u(x < 0) ape" face, (5.21) 


where the complex constants a, and a_ have yet to be deter- 
mined. The direction of propagation can be verified by forming 
the complete wave function ~(z,t) given by 


plx, t) = u(x) éP, (5.22) 
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which represents two counterpropagating, travelling waves. The 
wave number kj is given by 


where we intentionally choose the positive root. In the vacuum 
with x > 0 the wave function is 


u(x > 0) = b ett, (5.24) 


where the complex constant b has yet to be determined. The wave 


number kz is given by 
{2m 


We assume that no left-propagating wave exists in the vacuum 
region x > 0. We need only consider energy W > C, since there 
can be no transmission for W < C. 


We require that the wave function and its first derivative be con- 
tinuous at x = 0. This leads to the coupled equations 


ay +a = by 
k 
a4 -a = di by (5.26) 
ky 
These can be immediately reduced to 
bt 2 
(OA = 1+ ko/ky 
a_ 1 1-— kə/kı 
a+ 2 1+ ko/ky ( ) 


Each propagating wave has an asssociated probability current 
given by 

, ih 

j= 


E [u(a) u (x) — u(x) u'(zx)]. (5.28) 
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Substituting the wave functions, we have 


l hk 
j<) = Z al 
m 
peaa s iiel 
m 
l hk 
Keso = |b). (5.29) 
m 


These represent the incident, reflected, and transmitted currents, 
respectively. The transmission probability D(W) is the ratio of the 
transmitted current divided by the incident current. This is 


4,/1—-C/w a 
(10) l 


D(W) = 


where we have made use of 


ko |W -C 
= .31 


where 0 < kə/kı < 1 for C < W < œ. Also, D(W) = 0 for 
W <C, since it is impossible for an electron to surmount or tun- 
nel through the potential barrier in this case. Also, 0 < D(W) < 1, 
as required for a probability. 


We are now in a position to calculate the emission current density 
j, given by (5.1). Making use of (5.18, 5.30), we have 


l 4nmekT f% -W 

j = a) dW ln few (SS) 41] 
44/1 — C/W (5.32) 
Crea 


This integral cannot easily be evaluated as a closed-form expres- 
sion, but is amenable to straightforward numerical evaluation. 


Considerable physical insight can be gained by approximating 
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D(W) ~ 1, corresponding to perfect transmission for energies 
W > C. We define a new variable € = W — C. We further de- 
fine a quantity ¢ called the work function by 


o=C-¢. (5.33) 


Physically, @ represents the energy which must be supplied to an 
electron at the Fermi level in order for emission to occur. This 
quantity is a unique property of the bulk material. In practice, the 
work function differs from the electron affinity defined above for 
an actual emitter. 


The integral for the emission current density 7 is approximated 


as 
= ane os 
if dé ln exp (= T ) + i]. (5.34) 
We further approximate 
ZV 6 
1. .35 
exp ( T ) < (5.35) 


Expanding the logarithm, the integral for j becomes 


a ( 2 wie (-=) , (5.36) 


h3 
The integral over € is readily performed. The emission current 
density j is given by 


_ 4nme 


73 (kT)? exp (-5): (5.37) 


This represents the main result of this section. It is known as the 
Richardson—Dushman law for thermionic emission [60]. 
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5.4 Field emission 


Electrons can be extracted from the surface of a bulk metal by 
applying an electric field. This process is known as field emission 
or cold emission. It relies on the quantum-mechanical process of a 
single electron tunneling through a potential barrier. This process 
takes place even at very low temperature. Our goal is to obtain 
an explicit expression for the emission current density 7 in terms 
of the applied field F, and the work function @, in the limit of low 
temperature T. This was first understood theoretically in 1927 by 
Fowler and Nordheim [31]. This source is now widely used in a 
variety of practical instruments. It is distinguished by its excep- 
tionally high brightness. In the following sections we proceed to 
derive the current density from first principles of quantum me- 
chanics. 


For the present purpose we can consider a conduction electron 
to be free inside the bulk metal at x < 0. The potential energy 
is thus given here by U(x) = 0. Outside the metal at x > 0, the 
potential energy is given by 


e2 


U(x) = C -— Fr -— (5.38) 


16Teo x’ 


where the first term on the right is a constant, associated with the 
energy in the vacuum. The second term on the right is the po- 
tential energy associated with the applied electric field, where F 
is given by the product of the electron charge e times the electric 
field in volts per meter. In this notation F has units of force, which 
is equivalent to energy per unit distance. The third term on the 
right in (5.38) is the potential energy associated with the image 
force. This form for the potential energy was first understood by 
Nordheim [66]. 


For now we will ignore the third term, since it is relatively weak 
in the limit of high electric field. This approximation was used by 
Fowler and Nordheim [31]. The potential energy is plotted in this 
approximation as a function of x in Figure 5.2. In a later section 
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U (x) 


y=0 


Figure 5.2: Approximate energy diagram for cold field emission. 


we will include this image force term to improve the accuracy and 
generality of the calculation. 


In the limit of zero absolute temperature T, conduction electrons 
within the bulk material occupy all energy states up to the max- 
imum energy ¢, which is the Fermi energy. No states above this 
energy are occupied. Classically no emission can occur, because 
the topmost filled energy level is below the top of the potential 
energy barrier. Quantum mechanically a conduction electron inci- 
dent on the barrier from within the buk can tunnel through the 
barrier, and be emitted into the vacuum. 


We designate D(W) as the probability that an electron with en- 
ergy W inside the material will tunnel through the potential bar- 
rier. Again we designate the incident current density per unit en- 
ergy from within the material as J(W). We seek the emission 
current density, which is the incident current density J(W) times 
the transmission probability D(W), integrated over all possible 
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energies W, where 0 < W < ¢. This is 
¢ 
pe i dW J(W) D(W), (5.39) 
0 
where ¢ is the Fermi energy. 


To find the field emission current density j, we must first solve 
for the transmission probability D(W). All relevant information is 
contained in the stationary state spatial wave function u(x). This 
is found by solving Schrédinger’s equation separately inside and 
outside the bulk metal for u(x), and matching the two solutions 
u(x) and their first derivatives u’(x) at the emission surface x = 0. 


Considering first the region interior to the bulk metal at x < 0, 
only the energy in the x-direction is relevant, as the transverse 
component has no effect. Schrodinger’s equation is given from the 
preceding analysis in Chapter 2 as 


te T e) u(x) = 0. (5.40) 


According to the earlier analysis, the wave vector k is related to 
the scalar kinetic momentum p by 


p = ħk =+V2mW, (5.41) 


where W is the total energy of a single electron in the x-direction. 
The state function is the sum of two linearly independent solutions 
u(x) for the eigenfunction u(x), 


x 
+ 
8 
| 
Q 
fav) 


ve) = a e™", (5.42) 


where a+ represent two arbitrary complex constants. One can im- 
mediately verify the solutions u+(x) by direct substitution into the 
differential equation. 
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The total energy eigenvalue W is the same for both solutions, 
and is given by 

ke 

2m 


= imo. (5.43) 


For now we consider a single, specific energy W with associated 
wave vector k. 


The probability current is 
jle) = 5— [u(x) u(x) — a(z) u'(2)], (5.44) 
where a bar over a quantity indicates complex conjugation. Sub- 


stituting u(x) above, we identify right- and left-propagating cur- 
rents 


I 
i 
E 
t 

N 


j+ (z < 0) 
(<0) = -—|a_]?, 5.45 
j-(es0) = - la- (5.45) 
respectively, where j, is the current incident on the potential bar- 
rier from left to right inside the bulk at x < 0, and j_ is the 
current reflected by the barrier from right to left. The algebraic 
sum of these is the tunneling current transmitted through the bar- 


rier. 


For x > 0, the vacuum, Schrödinger’s equation is 
—— — +U(x)| u(x) =W u(x), (5.46) 


where the potential energy U(x) is given approximately by 


U(a) x C—Fa. (5.47) 
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The quantity F is the electron charge e times the applied electric 
field, and has the dimension of force in nt. Equivalently we write 
this as 


d? 2m 
7 u(x) 4 E [W -C + Fr]u(x) = 0. (5.48) 
We rewrite this as 
a 
— u(x) + a? (x — p) u(x) = 0, (5.49) 
dx? 
where we have defined the constants 
2mF 
a 
C-W 
i= F` (5.50) 


We define a new variable y(x) as 
y(x) =a@a(ß-— rz). (5.51) 


The differential equation for u(x) is thus transformed into 
d 
— —y]| Y(y)=0 5.52 
(45-1) ro=o (5.52) 


where we have defined the eigenfunction Y (y) according to 


Y (y) = uļz(y)]. (5.53) 


Two linearly independent solutions for Y(y) exist, and are desig- 


nated l 
w (5.54) 


The functions Ai and Bi are called Airy functions. Their properties 
are well-known [1]. The vacuum is represented by large positive 
values of x, corresponding to large negative values of y. The Airy 
functions have asymptotic forms for y < 0 given by 


1 


yT (=y) 
1 


yT (=y) 


Ai(y) © sin | 3(-y)*/? + 4] 


Bily) & COS ESIK + z] ; (5.55) 
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We form the linear combination 

Y(y) = 64 [ Bily) + iAily)], (5.56) 
defined for all y, where b, is an arbitrary complex constant. For y 
large and negative, Y(y) has the asymptotic form 


Y (y) © by i[3(-y)8? +2] }. (5.57) 


1 
Fear À 
This represents a wave which propagates to the right in the co- 
ordinate x in the vacuum. This is a necessary condition, since we 
must assume no left-propagating wave can exist in the vacuum. 
We therefore adopt this form as our solution Y (y) for all y. We 
further define the probability current J(y) as 


J(y) = Jly(@) ] = j(@), (5.58) 


where j(x) is the current defined above. We notice from (5.44) 
that the wave function must have an imaginary part in order to 
have nonzero probability current. The wave function Y (y) satisfies 
this requirement. Substituting, we obtain 


Jy == [YOY U= Fu w). (5.59) 
The first derivative Y’(y) is given by 
Y'(y) = bs [BV (u) + iA (y)]. (5.60) 


It is straightforward to evaluate the current J(y), noticing that 
Ai(y) and Bi(y) are real-valued for y real. After some algebra we 
obtain 


ha 
Jy) = zy bl" [ At(y) Bi'(y) — Ai'(y) Bi(y) (5.61) 
The quantity in square brackets is the conserved Wronskian, and 
has the value 7~!. The current J(y) reduces to 


ha 5 
J(y) = ja |b+|°. (5.62) 
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This is independent of coordinate, as required by the fact that it 
is proportional to the conserved Wronskian. We identify 


jle > 0) =J[y(@)] = ËS yp (5.63) 


TM 


as the tunneling current propagating from left to right for x > 0. 
This result will prove useful later. 


Next we must match the solutions and their derivatives at x = 0, 
which is the emission surface. For x < 0 inside the bulk material 
we form the solution and its first derivative as 

u(x) = a etike +a e tke 


u(x) = tklaze’™* — a e™"]. (5.64) 
For x > 0 in the vacuum we form 


u(t) = Y[y(z)] 
u(r) = —aY'(y). (5.65) 


Matching the solutions and first derivatives at x = 0, equivalently 
y = af, we have two simultaneous equations, 


a +a- = Y(aß) 
a,-a_ = T Y'(ap). (5.66) 


Solving for a, and a_ we find 
iq 
a} = $Y(aB)+ I. Y'(aß) 


TES LY (ab) -ŽŽ Y'(ap). (5.67) 


Substituting for Y and Y’ above, we find 


a, = by { }[Bilap) + idilap)] + [Bi(as) +iAilap)] } 
a = b, {3[ Bilas) + iAi(as)] -$Z [Bi (as) + iAi(ap)] \. 


(5.68) 
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The fraction of electrons incident on the barrier from within the 
bulk, which tunnel through the barrier to be emitted into the 
vacuum is given by 

je> 0) 


D(W) = ey (5.69) 


where this depends on the energy W. Substituting the above ex- 
pressions for the two currents, this is 
a |b+/? 


DW) = — a (5.70) 


Evaluating the absolute square of the coefficients, this becomes 


pw) =". 
{4 (00) + Bi? (aß) + za + = [ Ai? (aß) + Bi?(aB)| ie 


(5.71) 


where we have again made use of the conserved Wronskian. Sub- 
stituting from above, 


2 1/3 
-1/6 
i = (27) PB W12, (5.72) 


It is left as an exercise for the reader, see Problems below, to sub- 
stitute some reasonable values. This represents a formal solution 
for the tunneling probability D(W). It is possible in principle to 
evaluate this numerically using the known series expansions for 
the Airy functions and their derivatives [1]. 


Additional physical insight can be gained by approximating these 
quantities. To this end we invoke the asymptotic forms for y > 0, 


1 28/2 
Aily) ~ zyn P (39) 
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yl 2 
a = -E oli) 
1 
Bie) = ia oli”) 
1/4 
: y 
Bi'(y) ~ == exp (yP), (5.73) 


where we set y = af at the interface between the bulk and the 
vacuum. We implicitly assume that y >> 0. Substituting these 
asmptotic forms into the above expression for D(W), we see that 
the terms in Bi and Bi’ dominate. Retaining only these terms, we 
obtain the approximation 


pw) =< En Pe vee) exp | -4 (a8)? ]. (5.74) 


Substituting for a8 and a/k above, this leads immediately to an 
expression for the tunneling probability, 


poja AUEN exp| 4 (2) cw], 


C 


(5.75) 
This is the probability that an electron with energy W will tunnel 
through the barrier. This represents one of the main results of this 
section. We are now in a position to calculate the emission current 
density j based on (5.1, 5.19, 5.75). The field emission current 
density j is given from (5.1) as 


j= i dW D(W) J(W), (5.76) 


where only states with 0 < W < Ç are occupied in the limit T —> 0. 
Substituting for D(W) and J(W) this becomes 


; 167em 
J BE aw ywo- W) ( 


(5.77) 
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We define a variable € as 
€=(C-—Ww)?”. (5.78) 


The integral (5.76) now takes the form 


Ge fa Ulje, (5.79) 


where U (£) is not to be confused with the potential energy above. 
This integral can be performed in principle, since the integrand is 
well-behaved over the range of integration [31]. A useful approxi- 
mation can be obtained from the series representation 


a a2 


Drees 5 


(5.80) 
which the reader can immediately verify by differentiating both 
sides with respect to €. The first term in the series vanishes, be- 
cause the function U vanishes at the two end points. The second 
and successive terms are infinite, owing to the factor VW in (5.76). 
We therefore approximate VW œ~ \/C and take this factor outside 
the integral for the second and higher terms only. Taking the sec- 
ond term only, it is straightforward to show that the field emission 
current density 7 is given approximately by 


ans Se 1/2 F2 4/2 = 
ÍN mh C+ 00 eo (- BF VAP - ee 


where we have made use of the definition of the work function @ 
as 


p=0C-Ñ, (5.82) 


and Ç is the Fermi energy. As a reminder, F is the electron charge 
e times the electric field in volts per meter. It has units of energy 
per unit length, or joules per meter in this notation. The equation 
(5.76) and its approximation (5.81) represent the main results of 
this section. The approximation is precisely the result given by 
Fowler and Nordheim [31]. It is left as an exercise for the reader 
to complete the details of this derivation. 
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Problems 
1. Complete the details of the derivation of (5.81). 


2. The work function for tungsten is 4.5 electron-Volts. Estimate 
the field F required for the onset of field emission from tungsten. 
Describe the functional dependence of the current density j on F 
for F higher and lower than this onset value. 


5.5 Emission with elevated tempera- 
ture and field 


In the preceding sections we explored thermionic emission, and 
separately cold field emission. In this section we generalize the 
preceding concepts to calculate the emission current density as a 
function of temperature and applied electric field. We follow the 
general approach of Murphy and Good [64], which is based on ear- 
lier work by Kemble [52]. 


Again we make use of (5.1) for the emission current density j, 
and (5.18) for the incident current density per unit energy J(W). 
The present task is to calculate the transmission probability D(W) 
that an electron with total energy W in one dimension will tunnel 
through the potential barrier. 


The potential energy U(x) is given by (5.38), and is plotted as 
a function of x in Figure 5.3. We intentionally include the image 
potential term in the following. The potential energy U(x) is as- 
sumed to join smoothly on both sides of the emission surface at 
x = 0. In the vacuum (x > 0), the potential energy (5.38) has a 
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U (x) 


Figure 5.3: Energy diagram for emission with elevated temperature 
and field. 


maximum value Um given by 


JeF 
m =C — . 5.83 
u, ATE ( ) 


The total energy W of a single electron inside the material (x < 0) 
can now take on any nonnegative value 0 < W < œ depending on 
the absolute temperature T. For 0 < W < Um quantum mechan- 
ical tunneling occurs. For W > Um no tunneling occurs, but the 
elevated field F and temperature T act together to enhance the 
emission. 


The spatial part of the wave function u(x) satisfies Schrödinger’s 
equation, which can be expressed in the form 


Z e AE E (5.84) 


where p(x) is the kinetic momentum given in one dimension by 


p(z) = + 2m[W — U(2)). (5.85) 
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We choose the positive root without loss of generality. 


All relevant information is contained in the wave function u(x). 
We assume without loss of generality that the wave function can 
be expressed as 


u(x) = exp [= s)| i (5.86) 


where the function S(x) has yet to be determined. Substituting 
into Schrödinger’s equation, we find after some algebra that 


dS\? , (@S 


In the classical limit where the term in h can be ignored, this re- 
duces to the Hamilton-Jacobi equation in one spatial dimension, 
where the electromagnetic potentials have no explicit time depen- 
dence. We therefore identify S(x) with Hamilton’s characteristic 
function. 


As before we expand S(x) in a series with powers of A as 
S(x) = Solz) +A S (2) +A? Soa) +... (5.88) 
Substituting and collecting terms in the powers of A, we find 
(Oon p?) +A (—i8"p + 28'98'1) +... =0. (5.89) 


Considering h to be small but variable, the quantities within each 
of the parentheses must vanish separately. The first equation re- 


duces to aS 
0 
—— = + . 
— = tn(c), (5.90) 


where p(x) is the kinetic momentum given above. Integrating be- 
tween any two coordinates £o and x we find 


Sol) = Soleo) + f pE ds (5.91) 


Substituting the second bracket in the series for S(x) we find 


a ai 
C bera 92 
dx 2p dx’ (592) 
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where we have made use of 
er Gn = p(x). (5.93) 
Integrating between the limits 79 and x we find 


p(x) |" 


(5.94) 


Si(x) = Si (£o) +å In [22 


Substituting above, and ignoring terms in ° and higher, we find 
the approximate solution for the wave function u(x) as 


1/2 
ue) m azo) | PED exp [te f pag]. (5.95) 


This solution for the wave function u(x) represents the WKB ap- 
proximation in one spatial dimension. This approximation was 
originally due to Wentzel, Kramers, and Brioullon. It is described 
in many books on quantum mechanics [79, 59]. The solution 
breaks down at the classical turning points where the momen- 
tum p(x) = 0, and is only a valid approximation at points remote 
from the turning points. 


We now consider the case where 0 < W < U,, where tunnel- 
ing occurs. The quantity p(x) is imaginary for xı < x < x2, where 
U(x) > W, and real everywhere else. The wave function u(x) is 
approximated by 


a+ iw(x) 

u le < zı) = ee 

Ë pi? 

a S 

u_(£ <x) = ee) 
pi? 
By. 2 

us(e> a2) = pO (5.96) 


where w(x) is defined as 


[ reas. (5.97) 
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This integral applies for any lower limit x9. In the following we 
choose £o far to the left of the barrier. The solution u(x < 
zı) is the right-propagating incident wave, u_(a < xı) is the 
left-propagating reflected wave, and u,(x > 22) is the right- 
propagating transmitted wave. While this approximation breaks 
down at the classical turning points x, and x2, the function w(x) 
is well-behaved everywhere. 


In order to calculate the transmission probability D(W), we must 
calculate the probability current j for the incident, reflected, and 
transmitted waves. In general 


j = 5 [u(x) u (z) — u(x) u'(zx)]. (5.98) 


We notice in (5.96) that u, (x) = u_(x) apart from constants. The 
current 7 is thus proportional to the conserved Wronskian, and is 
therefore conserved with respect to the coordinate x as required. 


Far from the classical turning points, and apart from constants, it 
is easily shown in this approximation that 


= p e 
a= p 1/2 o iw 
fo Sy | ce 2 y' i a iw 
u | 5 p a 5? e 
g = |Z ge i ed —iū 5.99 
u E p gP | e, (5.99) 


where we notice that p and w are both either pure real or pure 
imaginary. After some algebra we arrive at a general expression 
for the probability current j as 


j= PIP wa), (5.100) 
2m |p| 


This is identical with the result obtained by Kemble [52]. Substi- 
tuting, we arrive at the probability current as follows: 


| a4 |? 


f 2 = 
j+(z £1) m 
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2 
ite<m) = 
by |? 2 pes 
hess) = HLop(-7 fide). 6109 


In the third of these we have made use of 


a Se + fee | plê) dé (5.102) 


in the region x > zə, with only the second integral leading to a 
nonzero contribution to j+ (x > £2). 


The right-propagating solution u(x) must connect on both sides 
of the barrier. To ensure this we take 


Ja |? = |b, |? (5.103) 


in the above. The probability that a single electron with total 
energy W tunnels through the barrier is given by 


j+ (a > 29) 
DW) = =. 5.104 
(W) RE ( ) 
This leads immediately to 
DW ee -7 o ple) | de (5.105) 


in the present WKB approximation. Conservation of total current 
requires that 


jlx < a1) + j- (x < 41) = jy (a < x1). (5.106) 


Dividing both sides by j} (x < 21), it follows that the probability 
that a single electron with total energy W is reflected by the bar- 
rier is 1 — D(W). 


Substituting for p(x), we write 


2y 2M 
h 


DW) = exp | - J NO- Frrr" -W ae , (5.107) 
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where we have defined a constant 


e2 


= : 5.108 
A 167€9 ( ) 


This applies only to the tunneling case, where the total energy 
W is less than Um. The limits x, and 22 are the classical turning 
points where 


U(x) = W. (5.109) 
This leads to a quadratic equation with two roots 
C-W 4kF \ 
; = 1 |1- —— ¥ 11 
0 Se [1+ (1- etary) | a 


Setting k = 0, we reproduce the tunneling probability for the 
wedge-shaped barrier (5.75), with the difference that the leading 
proportionality constant is set equal to unity. This is due to the 
fact that the present approach relies on the WKB approximation, 
whereas the earlier calculation is exact. 


We now define two new quantities 


ee ae 
BO GW 
IV KE 
= 5.111 
y CW (5.111) 


Substituting and performing some algebra we find 


2iV2mK3/4 ea 
PeT 


D(W) = exp EFTA H 


Following Murphy and Good [64] we define a function u(y) as 


350 plt+y/1-¥? 
4/2 1—4/1-y? 


This function can be expressed in terms of standard elliptic inte- 
grals. The reader is referred to a recent paper by Deane et. al. [22] 


u(y) = (p —2+ yp)” dp. (5.113) 
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for a detailed and current discussion. The transmission probability 
D(W) is 


(5.114) 


16 m/2¢3/4 
DW) = ex | Sicilia J 


3 AF 1/4y3/2 y 
for energies 0 < W < Wm. The transmission probability for ener- 
gies W > Wm is D(W) = 1, which the reader can easily verify by 
applying the above procedure. 


We are now in a position to calculate the emission current density 
j for the general case of elevated temperature and field. From (5.1, 
5.18, 5.105) we find 


4 T [Wm — 
= mmc f dW ln exp (s a) + 1 
16 m2 43/4 
3AF U4, 3/2 vy 


4rmekT f% ¢-W 
pe ai In È (Gr +1], 


: ep | 


(5.115) 


where the constant y is defined in (5.108, 5.111), and the function 
v(y) is defined in (5.113). This represents the main result of this 
section. 


Problems 


1. Calculate the transmission probability D(W) for the wedge- 
shaped barrier using the WKB approximation. Compare this re- 
sult with (5.75). 


2. Calculate the exact transmission and reflection probabilities 
for a square barrier of height Up and width 2a for the two cases 
W > Uo and W < Uo. 


3. Calculate the transmission and reflection probabilities for a 
square barrier of height Up, for the two cases W > Ug and W < Uo 


5.6. Space charge limited emission 327 


using the WKB approximation. 


4. The present analysis assumes two linearly independent eigen- 
functions u(x). The eigenfunction u(x) represents a wave that 
is everywhere right-propagating. The eigenfunction u_(x) repre- 
sents a wave that is everywhere left-propagating. This leads to the 
form (5.105) for the transmission probability D(W). An earlier 
formulation by Kemble [52] assumes a different pair of linearly in- 
dependent eigenfunctions f,,(x) and f, (x). The eigenfunction f,(x) 
represents a wave that is left-propagating to the left of the bar- 
rier (reflected wave), and right-propagating to the right of the 
barrier (transmitted wave). The eigenfunction f,(x) represents a 
wave that is right-propagating to the left of the barrier (incident 
wave), and left-propagating to the right of the barrier (no wave). 
Show that this leads to an alternative form for the transmission 
probability D(W) given by 


D(W) = { 1+ exp Bi | p(x) [de ie (5.116) 


(Hint: Write down the analog of the connection formula (5.109) 
relating the coefficients of the eigenfunctions for the reflected and 
transmitted waves. This form for D(W) was assumed by Murphy 
and Good [64].) 


5.6 Space charge limited emission 


Emission of charged particles gives rise to a space charge cloud 
in front of the emission surface. We now investigate the condition 
where the space charge is sufficiently high to suppress the emission. 
We imagine two parallel plates of infinite extent, separated by 
a distance s. The emission surface is at zero potential, and the 
accelerating anode is at potential U,. We wish to find an expression 
for the current density 7 in the space between the plates, as a 


328 Chapter 5. Electron emission from solids 


function of the accelerating potential U, and the spacing s. This 
is given by 

j = p(x) o(2), (5.117) 
where p is the space charge density, and v is the particle speed. 
Charge conservation dictates that the current density j is indepen- 
dent of x. The electrostatic potential U(x) is governed by Poisson’s 
equation, which is given in one dimension as 

d? p(x) 


Z U(2) = s (5.118) 


The particle speed is given by energy conservation as 


2eU(x) 


m 


v= (5.119) 


Substituting, we obtain a differential equation for the potential 
U(x) as 


U"(¢) = —2 (5.120) 
U(x) 
where we have defined the constant a as 
j fm 
= —+ ,/—. 5.121 
. €o V 2e ( ) 
We now make use of the fact that 
d 2 d d 
thy NSU U” 2 in 5.122 
dx ( : dx dU ( ) 
This yields the differential equation 
dU 
12\ kaan 
d(U”) =2a Te (5.123) 
This is integrated immediately to yield 
U’? = 4a VU + const. (5.124) 


At this point we invoke the condition that the field is zero at 
the emission surface at cutoff. Mathematically, this equilibrium 
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is expressed as U’(0) = 0. From before we also have U(0) = 0, 
in which case, the integration constant is zero. Taking the square 
root of both sides, we obtain 


U-V4 dU = 2 Va dz. (5.125) 


Integrating the left side between the limits U = 0 and U = Ua, 

and integrating the right side between the limits x = 0 and x = s, 

4 34 

3 Ug = 2 Vas. (5.126) 

Squaring both sides, substituting for a, and rearranging factors, 

we obtain the desired expression for the magnitude of the current 
density j as 

_ 4€ /2e U3? 

= 9 Vm 22` 


(5.127) 


This is known as the Child—Langmuir equation for space charge 
limited emission. 


Appendix A 


The Fourier transform 


As a mathematical method, the Fourier transform provides a pow- 
erful, simplifying tool for a variety of physical problems. This de- 
rives from the fact that a Fourier transform of a function represents 
the spectral density of the function in the frequency domain. It is 
a special case in the general theory of Hilbert spaces. Rather than 
attempt a complete description of this theory, we will confine our 
attention here only to those aspects that are directly applicable to 
the present study. 


We consider an arbitrary complex function f(x), defined over the 
range —co < x < +00. We define the Fourier transform f(k) as 


fk) = i - dze ™? f(x), (A.1) 


where k is called the transform variable, and in general —co < k < 
+oo. We assume that the function f(x) is such that the integral 
is finite. This is true for most problems of physical interest, where 
f is well-behaved in this sense. Operating on both sides from the 
left by 


1 se ee 
= dk e"? A.2 
2m A k : 4 ( ) 
we obtain 


l R ika! f = i “de i —ik(a—z’) 
= | ake fiy= [av ste) |= [ake l 
(A.3) 
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where we have reversed the order of integrations on the right side. 
We recognize the large bracket as an integral representation of the 
Dirac delta function, namely 


1 ca —ik( a 2 ! 
wom ik(x—x') __ _ AA 
= | ake 5(a — 2’), (A.A) 


where, for a certain broad class of well-behaved functions f(z), 
this has the property 


ie dx f(x) 6(2 — 2) = f(z’). (A.5) 


It follows that a 
== f(k) ef”. A. 
= | _dkfik)e (A-6) 


Evidently, this represents the inverse Fourier transform, as it re- 
produces the original function f(x). The transform (A.1) together 
with its inverse (A.6) thus form an intimately related pair. Because 
f (k) multiplies the phase factor on the right side, it represents the 
spectral density of f(x) with respect to the frequency k, where k 
has the dimensions z~t. Evaluating the transform at zero argu- 
ment, it follows immediately that 


= i de f(z). (A.7) 


Evaluating the transform f at zero argument gives the integral of 
the function f over its whole range. This property will turn out to 
be very useful. 


We now derive several other useful properties. We define the con- 
volution h(x) of two functions f(x) and g(x) as the integral 


ae dx’ f(x) g(x — 2’). (A.8) 
This operation is often abbreviated by 


h(a) = f(x) * g(x). (A.9) 
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Substituting the inverse transforms for f and g, we find 


Ll Paneer 
[es -F dka je, (A.10) 


Interchanging the order of integrations, this becomes 


=z aioe f w er [xe fare], 
(A.11) 


We recognize the quantity in square brackets as an integral repre- 
sentation of the Dirac delta function 6(k’ — k). This leads to 


z > [ak Fo aie e™ (A.12) 


From the definition of the inverse transform h(x) this immediately 
yields - - 

h(k) = f(k) g(k). (A.13) 
In words, the Fourier transform of a convolution of two functions 
is equal to the product of the Fourier transforms of the two func- 
tions. This general result is called the convolution theorem. 


Next we define the autocorrelation function F(x) of a function 
f(x) as the integral 


ae ae di! f(x’) f* (x — 2), (A.14) 


where f* denotes the complex conjugate of f. Substituting the 
inverse transforms for f and f*, we find 


Fe E enoe] 
[š - [ak F) pnt (A.15) 


Interchanging the order of integrations, this becomes 


Fle BN dk f(k care dk! F wl -T dir! eile al 


(A.16) 
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We recognize the quantity in square brackets as an integral repre- 
sentation of the Dirac delta function 6(k’ — k). This leads to 


F(x ie dk F” (k) f(k) e. (A.17) 


From the definition of the inverse transform F(x) this immediately 
yields 7 - 

F(k) = | f(k) |’. (A.18) 
In words, the Fourier transform of the autocorrelation is equal to 
the absolute square of the transform of f. This general result is 
called the autocorrelation theorem. 


Next we investigate the integral 


fear = [ele [anne 
È I E dk! F*(k!) a (A.19) 


where we have substituted the inverse transforms of f and f* on 
the right side. Interchanging the order of integrations we obtain 


a dx | f(x AN dk f(k IN dk! Î 


ff i. dx e” 2a ; (A.20) 


Again recognizing the square bracket as ô(k’— k), we immediately 
obtain 


ie dz | f(x E dk | f(k (A.21) 


This result is known as Parseval’s theorem. 
All of the preceding results for one spatial dimension can directly 


be generalized to two dimensions. For a function f(x,y) defined 
in two Cartesian dimensions, we define the Fourier transform as 


Pilkey ky) =a ie D dy either thu) f(g y), (A.22) 
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Operating on both sides from the left by 
OE o © dk, eiker thus") 
E A dky J  dkye l (A.23) 


we obtain, reversing the order of integrations 


1 oo oo £ 1 / F4 


A dx J: dy f(x,y) E T dk, ewe (e-#") 


T pee ciko 
z [aye ky(y n) 


= f(x,y’), (A.24) 


where we again have made use of the Dirac delta function. We 
thus obtain 


1 oo co : A 
Fey) = ai e d el) h (A.25) 


This represents the inverse Fourier transform in two Cartesian di- 
mensions. 


We now investigate what happens when we set one of the trans- 
form variable components k, equal to zero, 


f (Ke, 0) = is da eet fe dy f(x,y). (A.26) 


We define the projection f,(x) by integrating over one coordinate 
as follows: 


fol) = [dy f(x,y), (A.27) 
from which it follows that 
iki O= Fe (A.28) 


In words, setting one component of the transform variable to zero 
is equivalent to integrating over that degree of freedom in direct 
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space (x,y). This will turn out to be very useful in reducing the 
number of degrees of freedom in problems of multiple variables. 
In particular, it greatly simplifies the problem of the stochastic 
Coulomb interaction in a charged particle beam. 


Continuing this process, it is straightforward to show that 
F0,0) =f de [dy f(x,y), (A.29) 


As in the case of one dimension, the transform f(k,, ky), evaluated 
at zero argument, represents the integral of the function f(z, y) 
over the entire direct space (x, y). 


Next, we consider the special case where f(x,y) is a function only 
of r = Vx? + y?, and is independent of azimuthal angle ¢. The 


two-dimensional Fourier transform is 
Flke. ky) ai drr f(r i do othr eos 6 (A.30) 


where k = ,/k? + k? is the magnitude of the two-vector k. Here 
we have expressed the element of area dx dy = rdr dọ in polar 
coordinates. This reduces to 


F(k) = 20 f APETO Jo(kr), (A.31) 


where Jo is the zero order Bessel function, for which an integral 
representation is given by 
1 


2T . 
Jola) = =f e, (A.32) 


The above transform is often referred to as a Bessel transform. 
The transform f depends only on the magnitude of k. Following 
the same procedure, the inverse transform is readily found to be 


TE x if © dice FOI. (A.33) 


Thus, the radial symmetry of f leads to a simplification of the 
Fourier transform and its inverse transform. 
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As an example, we consider the radially symmetric function 


a 


f(r) = aea (A.34) 


Integrating over area, we find that f(r) is normalized to unity, 
where 


2T a dirf (ei. (A.35) 
0 
The Bessel transform is given by 
~ ka 
f(k) = Ti Ky (ka), (A.36) 


where Ky is the modified Bessel function. Here we have made use 
of the integral form 


œ J (bx) x”! dz a’—# bt 
f ) = K,,,(ab) (A.37) 


(pra? ee DB GE) 


for the special case where v = 0 and u = 1, where [ is the gamma- 
function. 


Projecting f(r) onto one Cartesian axis, the z-axis, we form the 


function 
[ows (ve F v?) 


2 


a D dy 
m J- [y+ (a? + @)P 


az 


2 (x2 + a?)3/2’ 


folz) 


(A.38) 


where we have made use of the form 


oo dy T 
I. Ge oe (A.39) 


The one-dimensional Fourier transform is given by 


folk) = “=e Ki (ke a). (A.40) 
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These seemingly esoteric relationships are very useful in the the- 
ory of small angle plural scattering. 


The above arguments are easily extended to n dimensions, in which 
case the Fourier transform is defined as 


k) = | dx f(x) ex, (A.41) 


where x and k are n-vectors, and k -x = kı £1 +... + kn £n is the 
inner product. Applying the preceding logic, the inverse Fourier 
transform is found to be 


= oor i dk f(k) e™*, (A.42) 


The convolution theorem in n dimensions is found to be 
h(k) = f(k) g(k), (A.43) 
where the n-dimensional convolution is defined as 


x) = f anx' f(x g(x — x’), (A.44) 


and the integration is performed over all of space. The autocorre- 
lation theorem in n dimensions is found to be 


Ž(k) =| Fo) P, (A.45) 
where the n-dimensional autocorrelation function is defined as 

SETAN x’) f*(x’ — x). (A.46) 
Parseval’s theorem in n dimensions is found to be 


[EIOP g frei foor (aan 


This gives us all of the necessary tools to apply the powerful for- 
malism of Fourier analysis to practical problems of charged particle 
optics. 


Appendix B 


Linear second-order 
differential equation 


The paraxial ray equation (2.138, 2.162, 2.239) are examples of 
a more general linear second-order differential equation. We seek 
a solution for a function y(x), which satisfies an equation of the 
general form 
dy dy 

P(x) 4+ Q(e) 2+ Roy =S(e), (B1) 
where P,Q, R, and S are known functions of x. We shall see in the 
following that this applies directly to the problem of the chromatic 
aberration. 


In the case S = 0 the equation is designated homogeneous, and in 
the case S # 0 the equation is designated inhomogenous. The solu- 
tion yp(x)of the homogenous equation can always be expressed as 
a linear combination of two functions u(x) and u(x) as follows: 


Ynlz) = Cy u(x) + Cp Ua (x), (B.2) 


where cı and c2 are constants, and where wu, and uz are not con- 
stant multiples of one another. The paraxial ray equations (2.162, 
2.239) for the transverse displacement v(z) = x(z) +i y(z) in the 
rotated system is an example of just such a homogeneous equation. 
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We state the problem as follows: given a solution y(x) to the ho- 
mogeneous equation, find a solution to the inhomogeneous equa- 
tion. A theorem states that the solution to the inhomogeneous 
equation can always be expressed as the sum of the homogeneous 
solution, plus any particular solution to the inhomogeneous equa- 
tion, 1.e., 

ylz) = yn(x) + Y(T). (B.3) 


We now proceed to find a general solution for y(x), given y(x). 


For the particular solution yp we postulate a trial function 
Yp(x) = Ci (x) u(x) + Co(x) ua(z), (B.4) 


where Ci and Ch have yet to be specified. In the following we adopt 


the notation ; p 
y 1 y W 


Differentiating (B.4), we find 


Yp = Ciu, + Couy + Ci u + Cug 
Yp” = Ci u” + Co ug” + 2 Ci ul + 2 Ch Us + Gir ui + O U2. 
(B.6) 


We are free to select one arbitrary condition on Cı and Cy. We 
choose this to be 


thus eliminating the last two terms in y/,. Differentiating (B.7), we 
find 
Ci ui + Cuh + Cy" uy + Co" uo = 0. (B.8) 


This reduces yp” to 
Yp” = Ci u” + Cou” + Clu + Ch us. (B.9) 


Substituting the reduced y, and y,” into (B.1), we find 


S 
Ci ui + Cu = D (B.10) 
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where we assume P is nonzero. Together with (B.7), this gives a 
pair of simultaneous equations for Ci and C}. Solving this pair, 
we find 


Ci(x) = Pa) Way 2) 
Cala) = Play Way Aa (B.11) 


where W(x) is the determinant defined as 
W(x) = u(x) uy(x) — u(x) uals). (B.12) 
The pair (B.11) can be integrated in principle to give 
S U2 
Cız) = — pw 
S ui 
= B.13 
from which it follows (B.2, B.4) that 


y(x) = (- oi de +c) u(x) 4 ( ai de ca) U2(x). 


This represents the general solution to (B.1). 


We are now in a position to apply this directly to the problem 
of chromatic aberration in the case of axial symmetry. The inho- 
mogeneous equation for 6v;(z) is (2.242). We identify the solution 
(B.2) to the homogeneous equation with 


Ovin(Z) = vio g(z) + dur, A(z), (B.15) 


where ĝv1ọ = 0, as there is no aberration in the object plane. Also, 
P=1, and 

W =gR' -g'h = kp (z). (B.16) 
The inhomogeneous term S is given by (2.243). Substituting these 
into (B.14), the chromatic aberration in the Gaussian image plane 
is given by 


meis = Jl i a(z) Se) Bede: (B.17) 
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where we have made use of g(zr) = M and h(z;) = 0. This is 
identical with (2.242). 
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